Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks
Suwen Zhao,
Ayano Sakai,
Xinshuai Zhang,
Matthew W Vetting,
Ritesh Kumar,
Brandan Hillerich,
Brian San Francisco,
Jose Solbiati,
Adam Steves,
Shoshana Brown,
Eyal Akiva,
Alan Barber,
Ronald D Seidel,
Patricia C Babbitt,
Steven C Almo,
John A Gerlt,
Matthew P Jacobson
Affiliations
Suwen Zhao
Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
Ayano Sakai
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
Xinshuai Zhang
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
Matthew W Vetting
Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
Ritesh Kumar
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
Brandan Hillerich
Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
Brian San Francisco
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
Jose Solbiati
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States
Adam Steves
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
Shoshana Brown
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
Eyal Akiva
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
Alan Barber
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
Ronald D Seidel
Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
Patricia C Babbitt
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
Steven C Almo
Department of Biochemistry, Albert Einstein College of Medicine, New York, United States
John A Gerlt
Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, United States; Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, United States; Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, United States
Matthew P Jacobson
Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, United States
Metabolic pathways in eubacteria and archaea often are encoded by operons and/or gene clusters (genome neighborhoods) that provide important clues for assignment of both enzyme functions and metabolic pathways. We describe a bioinformatic approach (genome neighborhood network; GNN) that enables large scale prediction of the in vitro enzymatic activities and in vivo physiological functions (metabolic pathways) of uncharacterized enzymes in protein families. We demonstrate the utility of the GNN approach by predicting in vitro activities and in vivo functions in the proline racemase superfamily (PRS; InterPro IPR008794). The predictions were verified by measuring in vitro activities for 51 proteins in 12 families in the PRS that represent ~85% of the sequences; in vitro activities of pathway enzymes, carbon/nitrogen source phenotypes, and/or transcriptomic studies confirmed the predicted pathways. The synergistic use of sequence similarity networks3 and GNNs will facilitate the discovery of the components of novel, uncharacterized metabolic pathways in sequenced genomes.