Clinical and Molecular Hepatology (Jan 2022)

Nonalcoholic fatty liver disease and early prediction of gestational diabetes mellitus using machine learning methods

  • Seung Mi Lee,
  • Suhyun Hwangbo,
  • Errol R. Norwitz,
  • Ja Nam Koo,
  • Ig Hwan Oh,
  • Eun Saem Choi,
  • Young Mi Jung,
  • Sun Min Kim,
  • Byoung Jae Kim,
  • Sang Youn Kim,
  • Gyoung Min Kim,
  • Won Kim,
  • Sae Kyung Joo,
  • Sue Shin,
  • Chan-Wook Park,
  • Taesung Park,
  • Joong Shin Park

DOI
https://doi.org/10.3350/cmh.2021.0174
Journal volume & issue
Vol. 28, no. 1
pp. 105 – 116

Abstract

Read online

Background/Aims To develop an early prediction model for gestational diabetes mellitus (GDM) using machine learning and to evaluate whether the inclusion of nonalcoholic fatty liver disease (NAFLD)-associated variables increases the performance of model. Methods This prospective cohort study evaluated pregnant women for NAFLD using ultrasound at 10–14 weeks and screened them for GDM at 24–28 weeks of gestation. The clinical variables before 14 weeks were used to develop prediction models for GDM (setting 1, conventional risk factors; setting 2, addition of new risk factors in recent guidelines; setting 3, addition of routine clinical variables; setting 4, addition of NALFD-associated variables, including the presence of NAFLD and laboratory results; and setting 5, top 11 variables identified from a stepwise variable selection method). The predictive models were constructed using machine learning methods, including logistic regression, random forest, support vector machine, and deep neural networks. Results Among 1,443 women, 86 (6.0%) were diagnosed with GDM. The highest performing prediction model among settings 1–4 was setting 4, which included both clinical and NAFLD-associated variables (area under the receiver operating characteristic curve [AUC] 0.563–0.697 in settings 1–3 vs. 0.740–0.781 in setting 4). Setting 5, with top 11 variables (which included NAFLD and hepatic steatosis index), showed similar predictive power to setting 4 (AUC 0.719–0.819 in setting 5, P=not significant between settings 4 and 5). Conclusions We developed an early prediction model for GDM using machine learning. The inclusion of NAFLD-associated variables significantly improved the performance of GDM prediction. (ClinicalTrials.gov Identifier: NCT02276144)

Keywords