Journal of Translational Medicine (Mar 2022)

Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records

  • Zheyi Dong,
  • Qian Wang,
  • Yujing Ke,
  • Weiguang Zhang,
  • Quan Hong,
  • Chao Liu,
  • Xiaomin Liu,
  • Jian Yang,
  • Yue Xi,
  • Jinlong Shi,
  • Li Zhang,
  • Ying Zheng,
  • Qiang Lv,
  • Yong Wang,
  • Jie Wu,
  • Xuefeng Sun,
  • Guangyan Cai,
  • Shen Qiao,
  • Chengliang Yin,
  • Shibin Su,
  • Xiangmei Chen

DOI
https://doi.org/10.1186/s12967-022-03339-1
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Established prediction models of Diabetic kidney disease (DKD) are limited to the analysis of clinical research data or general population data and do not consider hospital visits. Construct a 3-year diabetic kidney disease risk prediction model in patients with type 2 diabetes mellitus (T2DM) using machine learning, based on electronic medical records (EMR). Methods Data from 816 patients (585 males) with T2DM and 3 years of follow-up at the PLA General Hospital. 46 medical characteristics that are readily available from EMR were used to develop prediction models based on seven machine learning algorithms (light gradient boosting machine [LightGBM], eXtreme gradient boosting, adaptive boosting, artificial neural network, decision tree, support vector machine, logistic regression). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC). Shapley additive explanation (SHAP) was used to interpret the results of the best performing model. Results The LightGBM model had the highest AUC (0.815, 95% CI 0.747–0.882). Recursive feature elimination with random forest and SHAP plot based on LightGBM showed that older patients with T2DM with high homocysteine (Hcy), poor glycemic control, low serum albumin (ALB), low estimated glomerular filtration rate (eGFR), and high bicarbonate had an increased risk of developing DKD over the next 3 years. Conclusions This study constructed a 3-year DKD risk prediction model in patients with T2DM and normo-albuminuria using machine learning and EMR. The LightGBM model is a tool with potential to facilitate population management strategies for T2DM care in the EMR era.

Keywords