Interpretable machine learning for predicting chronic kidney disease progression risk

Jin-Xin Zheng; Xin Li; Jiang Zhu; Shi-Yang Guan; Shun-Xian Zhang; Wei-Ming Wang

doi:10.1177/20552076231224225

Digital Health (Jan 2024)

Interpretable machine learning for predicting chronic kidney disease progression risk

Jin-Xin Zheng,
Xin Li,
Jiang Zhu,
Shi-Yang Guan,
Shun-Xian Zhang,
Wei-Ming Wang

Affiliations

Jin-Xin Zheng: Department of Nephrology, Ruijin Hospital, Institute of Nephrology, , Shanghai, China
Xin Li: Department of Nephrology, Ruijin Hospital, Institute of Nephrology, , Shanghai, China
Jiang Zhu: Liver Transplantation Center, West China Hospital, Sichuan University, Chengdu, China
Shi-Yang Guan: , Hefei, Anhui, China
Shun-Xian Zhang: Clinical Research Center, Longhua Hospital, , Shanghai, China
Wei-Ming Wang: Department of Nephrology, Ruijin Hospital, Institute of Nephrology, , Shanghai, China

DOI: https://doi.org/10.1177/20552076231224225
Journal volume & issue: Vol. 10

Abstract

Read online

Objective Chronic kidney disease (CKD) poses a major global health burden. Early CKD risk prediction enables timely interventions, but conventional models have limited accuracy. Machine learning (ML) enhances prediction, but interpretability is needed to support clinical usage with both in diagnostic and decision-making. Methods A cohort of 491 patients with clinical data was collected for this study. The dataset was randomly split into an 80% training set and a 20% testing set. To achieve the first objective, we developed four ML algorithms (logistic regression, random forests, neural networks, and eXtreme Gradient Boosting (XGBoost)) to classify patients into two classes—those who progressed to CKD stages 3–5 during follow-up (positive class) and those who did not (negative class). For the classification task, the area under the receiver operating characteristic curve (AUC-ROC) was used to evaluate model performance in discriminating between the two classes. For survival analysis, Cox proportional hazards regression (COX) and random survival forests (RSFs) were employed to predict CKD progression, and the concordance index (C-index) and integrated Brier score were used for model evaluation. Furthermore, variable importance, partial dependence plots, and restrict cubic splines were used to interpret the models’ results. Results XGBOOST demonstrated the best predictive performance for CKD progression in the classification task, with an AUC-ROC of 0.867 (95% confidence interval (CI): 0.728–0.100), outperforming the other ML algorithms. In survival analysis, RSF showed slightly better discrimination and calibration on the test set compared to COX, indicating better generalization to new data. Variable importance analysis identified estimated glomerular filtration rate, age, and creatinine as the most important predictors for CKD survival analysis. Further analysis revealed non-linear associations between age and CKD progression, suggesting higher risks in patients aged 52–55 and 65–66 years. The association between cholesterol levels and CKD progression was also non-linear, with lower risks observed when cholesterol levels were in the range of 5.8–6.4 mmol/L. Conclusions Our study demonstrated the effectiveness of interpretable ML models for predicting CKD progression. The comparison between COX and RSF highlighted the advantages of ML in survival analysis, particularly in handling non-linearity and high-dimensional data. By leveraging interpretable ML for unraveling risk factor relationships, contrasting predictive techniques, and exposing non-linear associations, this study significantly advances CKD risk prediction to enable enhanced clinical decision-making.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal