Enzyme structure correlates with variant effect predictability

Floris van der Flier; Dave Estell; Sina Pricelius; Lydia Dankmeyer; Sander van Stigt Thans; Harm Mulder; Rei Otsuka; Frits Goedegebuur; Laurens Lammerts; Diego Staphorst; Aalt D.J. van Dijk; Dick de Ridder; Henning Redestig

Computational and Structural Biotechnology Journal (Dec 2024)

Enzyme structure correlates with variant effect predictability

Floris van der Flier,
Dave Estell,
Sina Pricelius,
Lydia Dankmeyer,
Sander van Stigt Thans,
Harm Mulder,
Rei Otsuka,
Frits Goedegebuur,
Laurens Lammerts,
Diego Staphorst,
Aalt D.J. van Dijk,
Dick de Ridder,
Henning Redestig

Affiliations

Floris van der Flier: Department of Plant Sciences, Wageningen University & Research, Wageningen, 6708 PB, the Netherlands
Dave Estell: Health & Biosciences, International Flavors and Fragrances, Palo Alto, 94304 CA, USA
Sina Pricelius: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Lydia Dankmeyer: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Sander van Stigt Thans: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Harm Mulder: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Rei Otsuka: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Frits Goedegebuur: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Laurens Lammerts: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Diego Staphorst: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands
Aalt D.J. van Dijk: Department of Plant Sciences, Wageningen University & Research, Wageningen, 6708 PB, the Netherlands
Dick de Ridder: Department of Plant Sciences, Wageningen University & Research, Wageningen, 6708 PB, the Netherlands
Henning Redestig: Health & Biosciences, International Flavors and Fragrances, Oegstgeest, 2342 BG, the Netherlands; Corresponding author.

Journal volume & issue: Vol. 23
pp. 3489 – 3497

Abstract

Read online

Protein engineering increasingly relies on machine learning models to computationally pre-screen promising novel candidates. Although machine learning approaches have proven effective, their performance on prospective screening data leaves room for improvement; prediction accuracy can vary greatly from one protein variant to the next. So far, it is unclear what characterizes variants that are associated with large prediction error. In order to establish whether structural characteristics influence predictability, we created a novel high-order combinatorial dataset for an enzyme spanning 3,706 variants, that can be partitioned into subsets of variants with mutations at positions exclusively belonging to a particular structural class. By training four different supervised variant effect prediction (VEP) models on structurally partitioned subsets of our data, we found that predictability strongly depended on all four structural characteristics we tested; buriedness, number of contact residues, proximity to the active site and presence of secondary structure elements. These dependencies were also found in several single mutation enzyme variant datasets, albeit with dataset specific directions. Most importantly, we found that these dependencies were similar for all four models we tested, indicating that there are specific structure and function determinants that are insufficiently accounted for by current machine learning algorithms. Overall, our findings suggest that improvements can be made to VEP models by exploring new inductive biases and by leveraging different data modalities of protein variants, and that stratified dataset design can highlight areas of improvement for machine learning guided protein engineering.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords