AlphaFold predictions of fold-switched conformations are driven by structure memorization

Devlina Chakravarty; Joseph W. Schafer; Ethan A. Chen; Joseph F. Thole; Leslie A. Ronish; Myeongsang Lee; Lauren L. Porter

doi:10.1038/s41467-024-51801-z

Nature Communications (Aug 2024)

AlphaFold predictions of fold-switched conformations are driven by structure memorization

Devlina Chakravarty,
Joseph W. Schafer,
Ethan A. Chen,
Joseph F. Thole,
Leslie A. Ronish,
Myeongsang Lee,
Lauren L. Porter

Affiliations

Devlina Chakravarty: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Joseph W. Schafer: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Ethan A. Chen: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Joseph F. Thole: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Leslie A. Ronish: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Myeongsang Lee: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
Lauren L. Porter: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

DOI: https://doi.org/10.1038/s41467-024-51801-z
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Recent work suggests that AlphaFold (AF)–a deep learning-based model that can accurately infer protein structure from sequence–may discern important features of folded protein energy landscapes, defined by the diversity and frequency of different conformations in the folded state. Here, we test the limits of its predictive power on fold-switching proteins, which assume two structures with regions of distinct secondary and/or tertiary structure. We find that (1) AF is a weak predictor of fold switching and (2) some of its successes result from memorization of training-set structures rather than learned protein energetics. Combining >280,000 models from several implementations of AF2 and AF3, a 35% success rate was achieved for fold switchers likely in AF’s training sets. AF2’s confidence metrics selected against models consistent with experimentally determined fold-switching structures and failed to discriminate between low and high energy conformations. Further, AF captured only one out of seven experimentally confirmed fold switchers outside of its training sets despite extensive sampling of an additional ~280,000 models. Several observations indicate that AF2 has memorized structural information during training, and AF3 misassigns coevolutionary restraints. These limitations constrain the scope of successful predictions, highlighting the need for physically based methods that readily predict multiple protein conformations.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal