Diabetes, Metabolic Syndrome and Obesity (Sep 2022)

Predicting Diabetes in Patients with Metabolic Syndrome Using Machine-Learning Model Based on Multiple Years’ Data

  • Li J,
  • Xu Z,
  • Xu T,
  • Lin S

Journal volume & issue
Vol. Volume 15
pp. 2951 – 2961

Abstract

Read online

Jing Li1 *, Zheng Xu2 *, Tengda Xu,1 Songbai Lin1 1Department of Health Management, Peking Union Medical College Hospital, Beijing, People’s Republic of China; 2Department of AI Research, Digital Health China Technologies Co. Ltd, Beijing, People’s Republic of China*These authors contributed equally to this workCorrespondence: Songbai Lin, Department of Health Management, Peking Union Medical College Hospital, 1# Shuaifuyuan, Dongcheng District, Beijing, 100730, People’s Republic of China, Tel +86 10 6915 9901, Fax +86 10 6915 9901, Email [email protected]: To evaluate the performance of machine-learning models based on multiple years of continuous data to predict incident diabetes among patients with metabolic syndrome.Patients and Methods: The dataset comprises the health records from 2008 to 2020 including 4510 nondiabetic participants with metabolic syndrome (MetS) at baseline and with at least 6 years of records. MetS was defined according to the International Diabetes Federation (IDF) criteria. Overall, 332 patients developed incident diabetes during the 7± 1.4 years of follow-up. Three popular classification algorithms were evaluated on the dataset: logistic regression, random forest, and Xgboost. Five models including single-year models (year 1, year 2, and year 3) and multiple-year models (year 1– 2 and year 1– 3) were developed for each algorithm.Results: The model performances improved with the increasing longitudinal dataset as the area under the receiver operating characteristic curve (AUROC) was boosted for both random forest (year 1– 3: AUROC=0.893; year 3: AUROC=0.862; year 1– 2: AUROC=0.847; year 2: AUROC=0.838) and Xgboost (year 1– 3: AUROC=0.897; year 3: AUROC=0.833; year 1– 2: AUROC=0.856; year 2: AUROC=0.823) model. In the multiple-year models, the highest fasting plasma glucose, followed by the mean or lowest level of HbA1c and BMI had the most important predictive value for the onset of diabetes. In the “ 1– 3” year model, “delta weight” which reflects the fluctuations of yearly change of weight was the fourth-most important feature.Conclusion: This study demonstrated improved performance with the accumulation of longitudinal data when using machine learning for diabetes prediction in MetS patients. For individuals with similar clinical parameters, the variation trends of these parameters could change the risk of future diabetes. This result indicated that models based on longitudinal multiple years’ data may provide more personalized assessment tools for risk evaluation.Keywords: diabetes, metabolic syndrome, machine-learning method, prevention

Keywords