Endocrine Connections (Nov 2024)

Application of machine learning algorithm incorporating dietary intake in prediction of gestational diabetes mellitus

  • Tianze Ding,
  • Peijie Liu,
  • Jie Jia,
  • Hui Wu,
  • Jie Zhu,
  • Kefeng Yang

DOI
https://doi.org/10.1530/EC-24-0169
Journal volume & issue
Vol. 13, no. 12
pp. 1 – 8

Abstract

Read online

Introduction: Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects. Methods: A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data were investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using logistic regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by accuracy, sensitivity, specificity, area under curve (AUC) and Hosmer–Lemeshow test. Results: Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and logistic regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs 0.718 in XGBoost; 0.749 vs 0.726 in LightGBM). Conclusion: XGBoost and LightGBM algorithms outperform logistic regression in predicting GDM among Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.

Keywords