BMC Bioinformatics (Feb 2024)

A supervised learning method for classifying methylation disorders

  • Jesse R. Walsh,
  • Guangchao Sun,
  • Jagadheshwar Balan,
  • Jayson Hardcastle,
  • Jason Vollenweider,
  • Calvin Jerde,
  • Kandelaria Rumilla,
  • Christy Koellner,
  • Alaa Koleilat,
  • Linda Hasadsri,
  • Benjamin Kipp,
  • Garrett Jenkinson,
  • Eric Klee

DOI
https://doi.org/10.1186/s12859-024-05673-1
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age. Results Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader–Willi syndrome (PWS), Beckwith–Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver–Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868 $$-$$ - 0.995). Conclusion We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.

Keywords