Scientific Reports (Jun 2024)
Prediction model for cardiovascular disease in patients with diabetes using machine learning derived and validated in two independent Korean cohorts
Abstract
Abstract This study aimed to develop and validate a machine learning (ML) model tailored to the Korean population with type 2 diabetes mellitus (T2DM) to provide a superior method for predicting the development of cardiovascular disease (CVD), a major chronic complication in these patients. We used data from two cohorts, namely the discovery (one hospital; n = 12,809) and validation (two hospitals; n = 2019) cohorts, recruited between 2008 and 2022. The outcome of interest was the presence or absence of CVD at 3 years. We selected various ML-based models with hyperparameter tuning in the discovery cohort and performed area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort. CVD was observed in 1238 (10.2%) patients in the discovery cohort. The random forest (RF) model exhibited the best overall performance among the models, with an AUROC of 0.830 (95% confidence interval [CI] 0.818–0.842) in the discovery dataset and 0.722 (95% CI 0.660–0.783) in the validation dataset. Creatinine and glycated hemoglobin levels were the most influential factors in the RF model. This study introduces a pioneering ML-based model for predicting CVD in Korean patients with T2DM, outperforming existing prediction tools and providing a groundbreaking approach for early personalized preventive medicine.
Keywords