Comparison of gene set scoring methods for reproducible evaluation of tuberculosis gene signatures

Xutao Wang; Arthur VanValkenberg; Aubrey R. Odom; Jerrold J. Ellner; Natasha S. Hochberg; Padmini Salgame; Prasad Patil; W. Evan Johnson

doi:10.1186/s12879-024-09457-z

BMC Infectious Diseases (Jun 2024)

Comparison of gene set scoring methods for reproducible evaluation of tuberculosis gene signatures

Xutao Wang,
Arthur VanValkenberg,
Aubrey R. Odom,
Jerrold J. Ellner,
Natasha S. Hochberg,
Padmini Salgame,
Prasad Patil,
W. Evan Johnson

Affiliations

Xutao Wang: Department of Biostatistics, Boston University
Arthur VanValkenberg: Division of Infectious Disease, Center for Data Science, Rutgers New Jersey Medical School
Aubrey R. Odom: Division of Computational Biomedicine and Bioinformatics Program, Boston University
Jerrold J. Ellner: Department of Medicine, Center for Emerging Pathogens, Rutgers New Jersey Medical School
Natasha S. Hochberg: Boston Medical Center
Padmini Salgame: Department of Medicine, Center for Emerging Pathogens, Rutgers New Jersey Medical School
Prasad Patil: Department of Biostatistics, Boston University
W. Evan Johnson: Division of Infectious Disease, Center for Data Science, Rutgers New Jersey Medical School

DOI: https://doi.org/10.1186/s12879-024-09457-z
Journal volume & issue: Vol. 24, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease. However, an unresolved issue is whether gene set enrichment analysis of the signature transcripts alone is sufficient for prediction and differentiation or whether it is necessary to use the original model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data and missing details about the original trained model. To facilitate the utilization of these signatures in TB research, comparisons between gene set scoring methods cross-data validation of original model implementations are needed. Methods We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both rrebuilt original models and gene set scoring methods. Existing gene set scoring methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, were used as alternative approaches to obtain the profile scores. The area under the ROC curve (AUC) value was computed to measure performance. Correlation analysis and Wilcoxon paired tests were used to compare the performance of enrichment methods with the original models. Results For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original models. In some cases, PLAGE outperformed the original models when considering signatures’ weighted mean AUC values and the AUC results within individual studies. Conclusion Gene set enrichment scoring of existing gene sets can distinguish patients with active TB disease from other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.

Published in BMC Infectious Diseases

ISSN: 1471-2334 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Infectious and parasitic diseases
Website: https://bmcinfectdis.biomedcentral.com

About the journal

Abstract

Keywords