Scientific Reports (Mar 2024)

Explainable machine learning for early predicting treatment failure risk among patients with TB-diabetes comorbidity

  • An-zhou Peng,
  • Xiang-Hua Kong,
  • Song-tao Liu,
  • Hui-fen Zhang,
  • Ling-ling Xie,
  • Li-juan Ma,
  • Qiu Zhang,
  • Yong Chen

DOI
https://doi.org/10.1038/s41598-024-57446-8
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract The present study aims to assess the treatment outcome of patients with diabetes and tuberculosis (TB-DM) at an early stage using machine learning (ML) based on electronic medical records (EMRs). A total of 429 patients were included at Chongqing Public Health Medical Center. The random-forest-based Boruta algorithm was employed to select the essential variables, and four models with a fivefold cross-validation scheme were used for modeling and model evaluation. Furthermore, we adopted SHapley additive explanations to interpret results from the tree-based model. 9 features out of 69 candidate features were chosen as predictors. Among these predictors, the type of resistance was the most important feature, followed by activated partial throm-boplastic time (APTT), thrombin time (TT), platelet distribution width (PDW), and prothrombin time (PT). All the models we established performed above an AUC 0.7 with good predictive performance. XGBoost, the optimal performing model, predicts the risk of treatment failure in the test set with an AUC 0.9281. This study suggests that machine learning approach (XGBoost) presented in this study identifies patients with TB-DM at higher risk of treatment failure at an early stage based on EMRs. The application of a convenient and economy EMRs based on machine learning provides new insight into TB-DM treatment strategies in low and middle-income countries.