iScience (Sep 2025)

International multicenter development of ensemble machine learning driven host response based diagnosis for tuberculosis

  • Shufa Zheng,
  • Wenxin Qu,
  • Dan Zhang,
  • Jieting Zhou,
  • Yifan Xu,
  • Wei Wu,
  • Chang Liu,
  • Mingzhu Huang,
  • Enhui Shen,
  • Xiao Chen,
  • Michael P. Timko,
  • Longjiang Fan,
  • Fei Yu,
  • Dongsheng Han,
  • Yifei Shen

DOI
https://doi.org/10.1016/j.isci.2025.113444
Journal volume & issue
Vol. 28, no. 9
p. 113444

Abstract

Read online

Summary: Active pulmonary tuberculosis (TB) is challenging to diagnose, and monitoring treatment response effectively remains difficult. To address these challenges, we developed TB-Scope, a host-gene-expression-based ensemble machine learning classification model. Using large-scale microarray datasets (N = 1,258) from three retrospective transcriptomic studies, we selected 143 feature genes (biomarkers) based on their expression ranks to predict ATB. The Top Scoring Pairs (TSP) ensemble classifier for ATB diagnosis was optimized using multi-cohort training samples. We then combined the ATB/Health, ATB/LTBI, and ATB/ODs classifiers to construct an ATB diagnosis decision model (TB-Scope decision). To assess the performance of the TB-Scope algorithm and decision model, we analyzed 12 independent microarray and RNA-seq validation datasets (N = 1,786) comprising both children and adults from seven countries. Thus, our data demonstrates that TB-Scope provides a powerful and reliable tool for accurately diagnosing ATB across diverse data platforms.

Keywords