Healthcare (Apr 2023)

Application of Machine Learning Algorithms to Predict Uncontrolled Diabetes Using the All of Us Research Program Data

  • Tadesse M. Abegaz,
  • Muktar Ahmed,
  • Fatimah Sherbeny,
  • Vakaramoko Diaby,
  • Hongmei Chi,
  • Askal Ayalew Ali

DOI
https://doi.org/10.3390/healthcare11081138
Journal volume & issue
Vol. 11, no. 8
p. 1138

Abstract

Read online

There is a paucity of predictive models for uncontrolled diabetes mellitus. The present study applied different machine learning algorithms on multiple patient characteristics to predict uncontrolled diabetes. Patients with diabetes above the age of 18 from the All of Us Research Program were included. Random forest, extreme gradient boost, logistic regression, and weighted ensemble model algorithms were employed. Patients who had a record of uncontrolled diabetes based on the international classification of diseases code were identified as cases. A set of features including basic demographic, biomarkers and hematological indices were included in the model. The random forest model demonstrated high performance in predicting uncontrolled diabetes, yielding an accuracy of 0.80 (95% CI: 0.79–0.81) as compared to the extreme gradient boost 0.74 (95% CI: 0.73–0.75), the logistic regression 0.64 (95% CI: 0.63–0.65) and the weighted ensemble model 0.77 (95% CI: 0.76–0.79). The maximum area under the receiver characteristics curve value was 0.77 (random forest model), while the minimum value was 0.7 (logistic regression model). Potassium levels, body weight, aspartate aminotransferase, height, and heart rate were important predictors of uncontrolled diabetes. The random forest model demonstrated a high performance in predicting uncontrolled diabetes. Serum electrolytes and physical measurements were important features in predicting uncontrolled diabetes. Machine learning techniques may be used to predict uncontrolled diabetes by incorporating these clinical characteristics.

Keywords