Scientific Data (May 2024)

Analysis of AlphaMissense data in different protein groups and structural context

  • Hedvig Tordai,
  • Odalys Torres,
  • Máté Csepi,
  • Rita Padányi,
  • Gergely L. Lukács,
  • Tamás Hegedűs

DOI
https://doi.org/10.1038/s41597-024-03327-8
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Single amino acid substitutions can profoundly affect protein folding, dynamics, and function. The ability to discern between benign and pathogenic substitutions is pivotal for therapeutic interventions and research directions. Given the limitations in experimental examination of these variants, AlphaMissense has emerged as a promising predictor of the pathogenicity of missense variants. Since heterogenous performance on different types of proteins can be expected, we assessed the efficacy of AlphaMissense across several protein groups (e.g. soluble, transmembrane, and mitochondrial proteins) and regions (e.g. intramembrane, membrane interacting, and high confidence AlphaFold segments) using ClinVar data for validation. Our comprehensive evaluation showed that AlphaMissense delivers outstanding performance, with MCC scores predominantly between 0.6 and 0.74. We observed low performance on disordered datasets and ClinVar data related to the CFTR ABC protein. However, a superior performance was shown when benchmarked against the high quality CFTR2 database. Our results with CFTR emphasizes AlphaMissense’s potential in pinpointing functional hot spots, with its performance likely surpassing benchmarks calculated from ClinVar and ProteinGym datasets.