Scientific Reports (Dec 2022)

Machine learning approaches demonstrate that protein structures carry information about their genetic coding

  • Linor Ackerman-Schraier,
  • Aviv A. Rosenberg,
  • Ailie Marx,
  • Alex M. Bronstein

DOI
https://doi.org/10.1038/s41598-022-25874-z
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Synonymous codons translate into the same amino acid. Although the identity of synonymous codons is often considered inconsequential to the final protein structure, there is mounting evidence for an association between the two. Our study examined this association using regression and classification models, finding that codon sequences predict protein backbone dihedral angles with a lower error than amino acid sequences, and that models trained with true dihedral angles have better classification of synonymous codons given structural information than models trained with random dihedral angles. Using this classification approach, we investigated local codon–codon dependencies and tested whether synonymous codon identity can be predicted more accurately from codon context than amino acid context alone, and most specifically which codon context position carries the most predictive power.