European Journal of Medical Research (Nov 2024)

Improving methylmalonic acidemia (MMA) screening and MMA genotype prediction using random forest classifier in two Chinese populations

  • Zhe Yin,
  • Chuan Zhang,
  • Rui Dong,
  • Xinyuan Zhang,
  • Yingnan Song,
  • Shengju Hao,
  • Zhongtao Gai,
  • Bingbo Zhou,
  • Ling Hui,
  • Shifan Wang,
  • Huiqin Xue,
  • Zongfu Cao,
  • Yi Liu,
  • Xu Ma

DOI
https://doi.org/10.1186/s40001-024-02115-9
Journal volume & issue
Vol. 29, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Methylmalonic acidemia (MMA) is one of the most common hereditary organic acid metabolism disorders that endangers the lives and health of infants and children. Early detection and intervention before the appearance of a newborn’s clinical symptoms can control disease progression and prevent or mitigate its serious consequences. Methods 42,004 newborns from two Chinese populations were included in the study. The small molecular metabolite analytes were detected from the dried blood spot (DBS) samples by MS/MS. Genetic analysis of 68 Chinese MMA cases were performed by whole-exome sequencing and Sanger sequencing. Random forest classifiers (RFC) were constructed to improve the MMA screening performance and genotype prediction in two Chinese populations. Meanwhile, other six machine learning models were trained to separate MMA patients from normal newborns. Model performance was assessed using accuracy, sensitivity, specificity, false positive rate (FPR), and positive predictive value (PPV) and the area under the receiver operating characteristic curve (AUC). Results In the total 42,004 newborn samples, 68 MMA cases were identified by genetic analysis, 42 cases of which were caused by variants in MMACHC, 24 cases by variants in MMUT, and two cases by variants in MMAA. Three novel variants including c.449T>G (p.I150R) of MMACHC, c.1151C>T (p.S384F) and c.1091_1108delins (p.Y364Sfs*4) in MMUT were identified in the MMA patients. RFC for newborn screening of MMA performed best as compared to several other classification models based on machine learning with 100% sensitivity, low FPR, excellent PPV and AUC. In addition, the subdivision RFC for MMA genotype prediction was constructed with superior performance. Conclusions It can be seen that RFC is extremely helpful for detection and genotype prediction in the newborn MMA screening. In addition, our findings extend the variant spectrum of genes related to MMA.

Keywords