Frontiers in Endocrinology (Sep 2024)

Study on risk factors of impaired fasting glucose and development of a prediction model based on Extreme Gradient Boosting algorithm

  • Qiyuan Cui,
  • Jianhong Pu,
  • Wei Li,
  • Yun Zheng,
  • Jiaxi Lin,
  • Lu Liu,
  • Peng Xue,
  • Jinzhou Zhu,
  • Mingqing He

DOI
https://doi.org/10.3389/fendo.2024.1368225
Journal volume & issue
Vol. 15

Abstract

Read online

ObjectiveThe aim of this study was to develop and validate a machine learning-based model to predict the development of impaired fasting glucose (IFG) in middle-aged and older elderly people over a 5-year period using data from a cohort study.MethodsThis study was a retrospective cohort study. The study population was 1855 participants who underwent consecutive physical examinations at the First Affiliated Hospital of Soochow University between 2018 and 2022.The dataset included medical history, physical examination, and biochemical index test results. The cohort was randomly divided into a training dataset and a validation dataset in a ratio of 8:2. The machine learning algorithms used in this study include Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), Naive Bayes, Decision Trees (DT), and traditional Logistic Regression (LR). Feature selection, parameter optimization, and model construction were performed in the training set, while the validation set was used to evaluate the predictive performance of the models. The performance of these models is evaluated by an area under the receiver operating characteristic (ROC) curves (AUC), calibration curves and decision curve analysis (DCA). To interpret the best-performing model, the Shapley Additive exPlanation (SHAP) Plots was used in this study.ResultsThe training/validation dataset consists of 1,855 individuals from the First Affiliated Hospital of Soochow University, yielded significant variables following selection by the Boruta algorithm and logistic multivariate regression analysis. These significant variables included systolic blood pressure (SBP), fatty liver, waist circumference (WC) and serum creatinine (Scr). The XGBoost model outperformed the other models, demonstrating an AUC of 0.7391 in the validation set.ConclusionsThe XGBoost model was composed of SBP, fatty liver, WC and Scr may assist doctors with the early identification of IFG in middle-aged and elderly people.

Keywords