Diabetes Epidemiology and Management (Jan 2024)
Predictive modeling for the development of diabetes mellitus using key factors in various machine learning approaches
Abstract
Aims: Machine learning (ML) approaches are beneficial when automatic identification of relevant features among numerous candidates is desired. We investigated the predictive ability of several ML models for new onset of diabetes mellitus. Methods: In 10,248 subjects who received annual health examinations, 58 candidates including fatty liver index (FLI), which is calculated by using waist circumference, body mass index and levels of triglycerides and γ-glutamyl transferase, were used. Results: During a 10-year follow-up period (mean period: 6.9 years), 322 subjects (6.5 %) in the training group (70 %, n=7,173) and 127 subjects (6.2 %) in the test group (30 %, n=3,075) had new onset of diabetes mellitus. Hemoglobin A1c, fasting glucose and FLI were identified as the top 3 predictors by random forest feature selection with 10-fold cross-validation. When hemoglobin A1c and FLI were used as the selected features, C-statistics analogous in receiver operating characteristic curve analysis in ML models including logistic regression, naïve Bayes, extreme gradient boosting and artificial neural network were 0.874, 0.869, 0.856 and 0.869, respectively. There was no significant difference in the discriminatory capacity among the ML models. Conclusions: ML models incorporating hemoglobin A1c and FLI provide an accurate and straightforward approach for predicting the development of diabetes mellitus.