Explainable machine learning for early predicting treatment failure risk among patients with TB-diabetes comorbidity

An-zhou Peng; Xiang-Hua Kong; Song-tao Liu; Hui-fen Zhang; Ling-ling Xie; Li-juan Ma; Qiu Zhang; Yong Chen

doi:10.1038/s41598-024-57446-8

Scientific Reports (Mar 2024)

Explainable machine learning for early predicting treatment failure risk among patients with TB-diabetes comorbidity

An-zhou Peng,
Xiang-Hua Kong,
Song-tao Liu,
Hui-fen Zhang,
Ling-ling Xie,
Li-juan Ma,
Qiu Zhang,
Yong Chen

Affiliations

An-zhou Peng: Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center
Xiang-Hua Kong: Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center
Song-tao Liu: Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center
Hui-fen Zhang: Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center
Ling-ling Xie: Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center
Li-juan Ma: Department of the Fifth Tuberculosis, Chongqing Public Health Medical Center
Qiu Zhang: Department of Endocrinology, First Affiliated Hospital of Anhui Medical University
Yong Chen: Department of Endocrinology, First Affiliated Hospital of Anhui Medical University

DOI: https://doi.org/10.1038/s41598-024-57446-8
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract The present study aims to assess the treatment outcome of patients with diabetes and tuberculosis (TB-DM) at an early stage using machine learning (ML) based on electronic medical records (EMRs). A total of 429 patients were included at Chongqing Public Health Medical Center. The random-forest-based Boruta algorithm was employed to select the essential variables, and four models with a fivefold cross-validation scheme were used for modeling and model evaluation. Furthermore, we adopted SHapley additive explanations to interpret results from the tree-based model. 9 features out of 69 candidate features were chosen as predictors. Among these predictors, the type of resistance was the most important feature, followed by activated partial throm-boplastic time (APTT), thrombin time (TT), platelet distribution width (PDW), and prothrombin time (PT). All the models we established performed above an AUC 0.7 with good predictive performance. XGBoost, the optimal performing model, predicts the risk of treatment failure in the test set with an AUC 0.9281. This study suggests that machine learning approach (XGBoost) presented in this study identifies patients with TB-DM at higher risk of treatment failure at an early stage based on EMRs. The application of a convenient and economy EMRs based on machine learning provides new insight into TB-DM treatment strategies in low and middle-income countries.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal