BMC Medical Informatics and Decision Making (Aug 2023)

Risk factor mining and prediction of urine protein progression in chronic kidney disease: a machine learning- based study

  • Yufei Lu,
  • Yichun Ning,
  • Yang Li,
  • Bowen Zhu,
  • Jian Zhang,
  • Yan Yang,
  • Weize Chen,
  • Zhixin Yan,
  • Annan Chen,
  • Bo Shen,
  • Yi Fang,
  • Dong Wang,
  • Nana Song,
  • Xiaoqiang Ding

DOI
https://doi.org/10.1186/s12911-023-02269-2
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Chronic kidney disease (CKD) is a global public health concern. Therefore, to provide timely intervention for non-hospitalized high-risk patients and rationally allocate limited clinical resources is important to mine the key factors when designing a CKD prediction model. Methods This study included data from 1,358 patients with CKD pathologically confirmed during the period from December 2017 to September 2020 at Zhongshan Hospital. A CKD prediction interpretation framework based on machine learning was proposed. From among 100 variables, 17 were selected for the model construction through a recursive feature elimination with logistic regression feature screening. Several machine learning classifiers, including extreme gradient boosting, gaussian-based naive bayes, a neural network, ridge regression, and linear model logistic regression (LR), were trained, and an ensemble model was developed to predict 24-hour urine protein. The detailed relationship between the risk of CKD progression and these predictors was determined using a global interpretation. A patient-specific analysis was conducted using a local interpretation. Results The results showed that LR achieved the best performance, with an area under the curve (AUC) of 0.850 in a single machine learning model. The ensemble model constructed using the voting integration method further improved the AUC to 0.856. The major predictors of moderate-to-severe severity included lower levels of 25-OH-vitamin, albumin, transferrin in males, and higher levels of cystatin C. Conclusions Compared with the clinical single kidney function evaluation indicators (eGFR, Scr), the machine learning model proposed in this study improved the prediction accuracy of CKD progression by 17.6% and 24.6%, respectively, and the AUC was improved by 0.250 and 0.236, respectively. Our framework can achieve a good predictive interpretation and provide effective clinical decision support.

Keywords