Discordance between a deep learning model and clinical-grade variant pathogenicity classification in a rare disease cohort

Sek Won Kong; In-Hee Lee; Lauren V. Collen; Michael Field; Arjun K. Manrai; Scott B. Snapper; Kenneth D. Mandl

doi:10.1038/s41525-025-00480-w

npj Genomic Medicine (Feb 2025)

Discordance between a deep learning model and clinical-grade variant pathogenicity classification in a rare disease cohort

Sek Won Kong,
In-Hee Lee,
Lauren V. Collen,
Michael Field,
Arjun K. Manrai,
Scott B. Snapper,
Kenneth D. Mandl

Affiliations

Sek Won Kong: Computational Health Informatics Program, Boston Children’s Hospital
In-Hee Lee: Computational Health Informatics Program, Boston Children’s Hospital
Lauren V. Collen: Department of Pediatrics, Harvard Medical School
Michael Field: Department of Pediatrics, Harvard Medical School
Arjun K. Manrai: Department of Biomedical Informatics, Harvard Medical School
Scott B. Snapper: Department of Pediatrics, Harvard Medical School
Kenneth D. Mandl: Computational Health Informatics Program, Boston Children’s Hospital

DOI: https://doi.org/10.1038/s41525-025-00480-w
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Genetic testing is essential for diagnosing and managing clinical conditions, particularly rare Mendelian diseases. Although efforts to identify rare phenotype-associated variants have focused on protein-truncating variants, interpreting missense variants remains challenging. Deep learning algorithms excel in various biomedical tasks1,2, yet distinguishing pathogenic from benign missense variants remains elusive3–5. Our investigation of AlphaMissense (AM)5, a deep learning tool for predicting the potential functional impact of missense variants and assessing gene essentiality, reveals limitations in identifying pathogenic missense variants over 45 rare diseases, including very early onset inflammatory bowel disease. For the expert-curated pathogenic variants identified in our cohort, AM’s precision was 32.9%, and recall was 57.6%. Notably, AM struggles to evaluate pathogenicity in intrinsically disordered regions (IDRs), resulting in unreliable gene-level essentiality scores for genes containing IDRs. This observation underscores ongoing challenges in clinical genetics, highlighting the need for continued refinement of computational methods in variant pathogenicity prediction.

Published in npj Genomic Medicine

ISSN: 2056-7944 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science: Biology (General): Genetics
Website: https://www.nature.com/npjgenmed/

About the journal