Communications Biology (Apr 2024)

A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites

  • Stuart A. MacGowan,
  • Fábio Madeira,
  • Thiago Britto-Borges,
  • Geoffrey J. Barton

DOI
https://doi.org/10.1038/s42003-024-06117-5
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.