A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients

Simon Lebech Cichosz; Clara Bender; Ole Hejlesen

doi:10.3390/diabetology5010001

Diabetology (Jan 2024)

A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients

Simon Lebech Cichosz,
Clara Bender,
Ole Hejlesen

Affiliations

Simon Lebech Cichosz: Department of Health Science and Technology, Aalborg University, 9000 Aalborg, Denmark
Clara Bender: Department of Health Science and Technology, Aalborg University, 9000 Aalborg, Denmark
Ole Hejlesen: Department of Health Science and Technology, Aalborg University, 9000 Aalborg, Denmark

DOI: https://doi.org/10.3390/diabetology5010001
Journal volume & issue: Vol. 5, no. 1
pp. 1 – 11

Abstract

Read online

Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.

Published in Diabetology

ISSN: 2673-4540 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the blood and blood-forming organs
Website: https://www.mdpi.com/journal/diabetology

About the journal

Abstract

Keywords