Nature Communications (Aug 2024)

Accurate prediction of protein function using statistics-informed graph networks

  • Yaan J. Jang,
  • Qi-Qi Qin,
  • Si-Yu Huang,
  • Arun T. John Peter,
  • Xue-Ming Ding,
  • Benoît Kornmann

DOI
https://doi.org/10.1038/s41467-024-50955-0
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Understanding protein function is pivotal in comprehending the intricate mechanisms that underlie many crucial biological activities, with far-reaching implications in the fields of medicine, biotechnology, and drug development. However, more than 200 million proteins remain uncharacterized, and computational efforts heavily rely on protein structural information to predict annotations of varying quality. Here, we present a method that utilizes statistics-informed graph networks to predict protein functions solely from its sequence. Our method inherently characterizes evolutionary signatures, allowing for a quantitative assessment of the significance of residues that carry out specific functions. PhiGnet not only demonstrates superior performance compared to alternative approaches but also narrows the sequence-function gap, even in the absence of structural information. Our findings indicate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting both existing properties and new functionalities of proteins in research and biomedicine.