Department of Systems Biology, Harvard Medical School, Boston, United States; Department of Biomedical Informatics, Harvard Medical School, Boston, United States
Luca Freschi
Department of Biomedical Informatics, Harvard Medical School, Boston, United States
Maximillian Marin
Department of Systems Biology, Harvard Medical School, Boston, United States; Department of Biomedical Informatics, Harvard Medical School, Boston, United States
L Elaine Epperson
Center for Genes, Environment and Health, Center for Genes, National Jewish Health, Denver, United States
Melissa Smith
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, United States; Icahn Institute of Data Sciences and Genomics Technology, New York, United States
Irina Oussenko
Icahn Institute of Data Sciences and Genomics Technology, New York, United States
David Durbin
Mycobacteriology Reference Laboratory, Advanced Diagnostic Laboratories, National Jewish Health, Denver, United States
Michael Strong
Center for Genes, Environment and Health, Center for Genes, National Jewish Health, Denver, United States
Max Salfinger
College of Public Health, University of South Florida, Tampa, United States; Morsani College of Medicine, University of South Florida, Tampa, United States
Maha Reda Farhat
Department of Biomedical Informatics, Harvard Medical School, Boston, United States; Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, United States
Tuberculosis (TB) is a leading cause of death globally. Understanding the population dynamics of TB’s causative agent Mycobacterium tuberculosis complex (Mtbc) in-host is vital for understanding the efficacy of antibiotic treatment. We use longitudinally collected clinical Mtbc isolates that underwent Whole-Genome Sequencing from the sputa of 200 patients to investigate Mtbc diversity during the course of active TB disease after excluding 107 cases suspected of reinfection, mixed infection or contamination. Of the 178/200 patients with persistent clonal infection >2 months, 27 developed new resistance mutations between sampling with 20/27 occurring in patients with pre-existing resistance. Low abundance resistance variants at a purity of ≥19% in the first isolate predict fixation in the subsequent sample. We identify significant in-host variation in 27 genes, including antibiotic resistance genes, metabolic genes and genes known to modulate host innate immunity and confirm several to be under positive selection by assessing phylogenetic convergence across a genetically diverse sample of 20,352 isolates.