Scientific Reports (Jul 2023)
Comparison of in silico predicted Mycobacterium tuberculosis spoligotypes and lineages from whole genome sequencing data
Abstract
Abstract Bacterial strain-types in the Mycobacterium tuberculosis complex underlie tuberculosis disease, and have been associated with drug resistance, transmissibility, virulence, and host–pathogen interactions. Spoligotyping was developed as a molecular genotyping technique used to determine strain-types, though recent advances in whole genome sequencing (WGS) technology have led to their characterization using SNP-based sub-lineage nomenclature. Notwithstanding, spoligotyping remains an important tool and there is a need to study the congruence between spoligotyping-based and SNP-based sub-lineage assignation. To achieve this, an in silico spoligotype prediction method (“Spolpred2”) was developed and integrated into TB-Profiler. Lineage and spoligotype predictions were generated for > 28 k isolates and the overlap between strain-types was characterized. Major spoligotype families detected were Beijing (25.6%), T (18.6%), LAM (13.1%), CAS (9.4%), and EAI (8.3%), and these broadly followed known geographic distributions. Most spoligotypes were perfectly correlated with the main MTBC lineages (L1-L7, plus animal). Conversely, at lower levels of the sub-lineage system, the relationship breaks down, with only 65% of spoligotypes being perfectly associated with a sub-lineage at the second or subsequent levels of the hierarchy. Our work supports the use of spoligotyping (membrane or WGS-based) for low-resolution surveillance, and WGS or SNP-based systems for higher-resolution studies.