Protein language models trained on multiple sequence alignments learn phylogenetic relationships

Umberto Lupo; Damiano Sgarbossa; Anne-Florence Bitbol

doi:10.1038/s41467-022-34032-y

Nature Communications (Oct 2022)

Protein language models trained on multiple sequence alignments learn phylogenetic relationships

Umberto Lupo,
Damiano Sgarbossa,
Anne-Florence Bitbol

Affiliations

Umberto Lupo: Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)
Damiano Sgarbossa: Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)
Anne-Florence Bitbol: Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)

DOI: https://doi.org/10.1038/s41467-022-34032-y
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 11

Abstract

Read online

Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal