Development of Various Diabetes Prediction Models Using Machine Learning Techniques

Juyoung Shin; Jaewon Kim; Chanjung Lee; Joon Young Yoon; Seyeon Kim; Seungjae Song; Hun-Sung Kim

doi:10.4093/dmj.2021.0115

Diabetes & Metabolism Journal (Jul 2022)

Development of Various Diabetes Prediction Models Using Machine Learning Techniques

Juyoung Shin,
Jaewon Kim,
Chanjung Lee,
Joon Young Yoon,
Seyeon Kim,
Seungjae Song,
Hun-Sung Kim

Affiliations

Juyoung Shin: Health Promotion Center, Seoul St. Mary’s Hospital, Seoul, Korea
Jaewon Kim: LifeSemantics Corp., Seoul, Korea
Chanjung Lee: LifeSemantics Corp., Seoul, Korea
Joon Young Yoon: LifeSemantics Corp., Seoul, Korea
Seyeon Kim: LifeSemantics Corp., Seoul, Korea
Seungjae Song: LifeSemantics Corp., Seoul, Korea
Hun-Sung Kim: Department of Endocrinology and Metabolism, College of Medicine, The Catholic University of Korea, Seoul, Korea

DOI: https://doi.org/10.4093/dmj.2021.0115
Journal volume & issue: Vol. 46, no. 4
pp. 650 – 657

Abstract

Read online

Background There are many models for predicting diabetes mellitus (DM), but their clinical implication remains vague. Therefore, we aimed to create various DM prediction models using easily accessible health screening test parameters. Methods Two sets of variables were used to develop eight DM prediction models. One set comprised 62 easily accessible examination results of commonly used variables from a tertiary university hospital. The second set comprised 27 of the 62 variables included in the national routine health checkups. Gradient boosting and random forest algorithms were used to develop the models. Internal validation was performed using the stratified 10-fold cross-validation method. Results The area under the receiver operating characteristic curve (ROC-AUC) for the 62-variable DM model making 12-month predictions for subjects without diabetes was the largest (0.928) among those of the eight DM prediction models. The ROC-AUC dropped by more than 0.04 when training with the simplified 27-variable set but still showed fairly good performance with ROC-AUCs between 0.842 and 0.880. The accuracy was up to 11.5% higher (from 0.807 to 0.714) when fasting glucose was included. Conclusion We created easily applicable diabetes prediction models that deliver good performance using parameters commonly assessed during tertiary university hospital and national routine health checkups. We plan to perform prospective external validation, hoping that the developed DM prediction models will be widely used in clinical practice.

Published in Diabetes & Metabolism Journal

ISSN: 2233-6079 (Print); 2233-6087 (Online)
Publisher: Korean Diabetes Association
Country of publisher: Korea, Republic of
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the endocrine glands. Clinical endocrinology
Website: http://e-dmj.org/

About the journal

Abstract

Keywords