Computational and Structural Biotechnology Journal (Dec 2024)

Using AI-predicted protein structures as a reference to predict loss-of-function activity in tumor suppressor breast cancer genes

  • Rohan Gnanaolivu,
  • Steven N. Hart

Journal volume & issue
Vol. 23
pp. 3472 – 3480

Abstract

Read online

Background: The loss-of-function (LOF) classification of most missense variants in tumor suppressor breast cancer genes BRCA1, BRCA2, PALB2, and RAD51C remains unclassified and confounds clinical actionability. Classifying these variants is challenging due to their rarity, leading clinicians to rely on in silico predictive methods. Protein stability changes are associated with function, making stability predictors valuable. Stability predictions upon missense variant perturbations require high-resolution protein structures. However, the availability of these high-resolution structures is lacking. This study explores using generative AI to predict high-resolution protein structures, which can then be analyzed with in silico protein stability prediction methods to assess LOF activity in ordered regions of the protein. This study also determines the appropriate in silico protein stability and dedicated in silico missense prediction methods in dbNSFP v4.7 database to predict LOF activity in ordered regions of these four genes. Functional classifications from homology recombination DNA repair (HDR) assays and variant classifications from the ClinVar database provide a reliable dataset for evaluating the performance of these in silico prediction methods. Results: Complex AlphaFold2 structures of the BRCA1-C terminal (BRCT) domain and the DNA-binding (DB) domain of BRCA2, analyzed using protein stability tool FoldX predicts LOF activity from missense variants significantly better than experimentally-derived structures in ordered regions. The BRCT domain achieved an Area Under the Curve (AUC)= 0.861 (95 % CI:0.858–0.863) and AUC= 0.842 (95 % CI:0.840–0.845), while the DB domain achieved an AUC= 0.836 (95 % CI:0.8322–0.841), compared to AUC= 0.847 (95 % CI:0.844–0.850) and AUC= 0.835 (95 % CI:0.832–0.837) from the BRCT domain, and AUC= 0.830 (95 % CI:0.821–0.8320) from the DB domain from experimentally-derived structures. Protein stability does not predict LOF activity from missense variants better than dedicated in silico missense predictors. Overall, we find that AlphaMissense ranks highly, with an average AUC= 0.890 (95 % CI 0.886–0.895) from ordered regions across these four cancer genes, compared to all other in silico missense predictors present in the dbNSFP database. Conclusions: The study reveals that generative AI protein predicted structures can outperform experimentally-derived structures in evaluating LOF activity from predicted protein stability in ordered regions of genes BRCA1, BRCA2, PALB2 and RAD51C. The study also highlights the predictive performance of AlphaMissense as the premier in silico missense prediction method to predict LOF activity from missense variants in these four tumor suppressor breast cancer genes. The code for this study can be downloaded for free on GitHub (https://github.com/rohandavidg/CarePred)

Keywords