Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families.

Marc Röttig; Christian Rausch; Oliver Kohlbacher

doi:10.1371/journal.pcbi.1000636

PLoS Computational Biology (Jan 2010)

Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families.

Marc Röttig,
Christian Rausch,
Oliver Kohlbacher

Affiliations

Marc Röttig
Christian Rausch
Oliver Kohlbacher

DOI: https://doi.org/10.1371/journal.pcbi.1000636
Journal volume & issue: Vol. 6, no. 1
p. e1000636

Abstract

Read online

An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/.

Published in PLoS Computational Biology

ISSN: 1553-734X (Print); 1553-7358 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Science: Biology (General)
Website: https://journals.plos.org/ploscompbiol/

About the journal