Predicting chronic kidney disease progression using small pathology datasets and explainable machine learning models

Sandeep Reddy; Supriya Roy; Kay Weng Choy; Sourav Sharma; Karen M Dwyer; Chaitanya Manapragada; Zane Miller; Joy Cheon; Bahareh Nakisa

Computer Methods and Programs in Biomedicine Update (Jan 2024)

Predicting chronic kidney disease progression using small pathology datasets and explainable machine learning models

Sandeep Reddy,
Supriya Roy,
Kay Weng Choy,
Sourav Sharma,
Karen M Dwyer,
Chaitanya Manapragada,
Zane Miller,
Joy Cheon,
Bahareh Nakisa

Affiliations

Sandeep Reddy: School of Medicine, Deakin University, Geelong, Australia; Corresponding author at: School of Medicine, Deakin University, Waurn Ponds, Victoria 3215, Australia.
Supriya Roy: School of Information Technology, Deakin University, Geelong, Australia
Kay Weng Choy: Northern Health, Melbourne, Australia
Sourav Sharma: School of Information Technology, Deakin University, Geelong, Australia
Karen M Dwyer: The Royal Melbourne Hospital, Melbourne, Australia
Chaitanya Manapragada: School of Medicine, Deakin University, Geelong, Australia
Zane Miller: School of Medicine, University of Melbourne, Australia
Joy Cheon: School of Medicine, University of Melbourne, Australia
Bahareh Nakisa: School of Information Technology, Deakin University, Geelong, Australia

Journal volume & issue: Vol. 6
p. 100160

Abstract

Read online

Background: Chronic kidney disease (CKD) poses a major global public health burden, with over 700 million affected. Early identification of those in whom the disease is likely to progress enables timely therapeutic interventions to delay advancement to kidney failure. Methods: This study developed explainable machine learning models leveraging pathology data to accurately predict CKD trajectory, targeting improved prognostic capability even in early stages using limited datasets. Key variables used in this study include age, gender, most recent estimated glomerular filtration rate (eGFR), mean eGFR, and eGFR slope over time prior to the incidence of kidney failure. Supervised classification modelling techniques included decision tree and random forest algorithms selected for interpretability. Internal validation on an Australian tertiary centre cohort (n = 706; 353 with kidney failure and 353 without) achieved exceptional predictive accuracy. To address the inherent class imbalance, centroid-cluster-based under-sampling was applied to the Australian dataset. For external validation, the model was applied to a dataset (n = 597 adults) sourced from a Japanese CKD registry. Transfer learning was subsequently employed by fine-tuning machine learning models on 15 % of the external dataset (n = 89) before evaluating the remaining 508 patients. Results: Internal validation achieved exceptional predictive accuracy, with the area under the receiver operating characteristic curve (ROC-AUC) reaching 0.94 and 0.98 on the binary task of predicting kidney failure for decision tree and random forest, respectively. External validation demonstrated performant results with an ROC-AUC of 0.88 for the decision tree and 0.93 for the random forest model. Decision tree model analysis revealed the most recent eGFR and eGFR slope as the most informative variables for prediction in the Japanese cohort. Conclusion: The research highlights the utility of deploying explainable machine learning techniques to forecast CKD trajectory even in the early stages utilising limited real-world datasets.

Published in Computer Methods and Programs in Biomedicine Update

ISSN: 2666-9900 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/computer-methods-and-programs-in-biomedicine-update

About the journal

Abstract

Keywords