Genome Biology (Jan 2024)

RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci

  • Sarah Fazal,
  • Matt C. Danzi,
  • Isaac Xu,
  • Shilpa Nadimpalli Kobren,
  • Shamil Sunyaev,
  • Chloe Reuter,
  • Shruti Marwaha,
  • Matthew Wheeler,
  • Egor Dolzhenko,
  • Francesca Lucas,
  • Stefan Wuchty,
  • Mustafa Tekin,
  • Stephan Züchner,
  • Vanessa Aguiar-Pulido

DOI
https://doi.org/10.1186/s13059-024-03171-4
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT’s high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.

Keywords