Molecular diversity of Mycobacterium tuberculosis complex in Sikkim, India and prediction of dominant spoligotypes using artificial intelligence

Kangjam Rekha Devi; Jagat Pradhan; Rinchenla Bhutia; Peggy Dadul; Atanu Sarkar; Nitumoni Gohain; Kanwar Narain

doi:10.1038/s41598-021-86626-z

Scientific Reports (Apr 2021)

Molecular diversity of Mycobacterium tuberculosis complex in Sikkim, India and prediction of dominant spoligotypes using artificial intelligence

Kangjam Rekha Devi,
Jagat Pradhan,
Rinchenla Bhutia,
Peggy Dadul,
Atanu Sarkar,
Nitumoni Gohain,
Kanwar Narain

Affiliations

Kangjam Rekha Devi: N.E. Region, Indian Council of Medical Research (ICMR)-Regional Medical Research Centre
Jagat Pradhan: National Tuberculosis Elimination Programme (NTEP)
Rinchenla Bhutia: National Tuberculosis Elimination Programme (NTEP)
Peggy Dadul: Department of Health Care, Human Services and Family Welfare, State Tuberculosis Control Society
Atanu Sarkar: N.E. Region, Indian Council of Medical Research (ICMR)-Regional Medical Research Centre
Nitumoni Gohain: N.E. Region, Indian Council of Medical Research (ICMR)-Regional Medical Research Centre
Kanwar Narain: N.E. Region, Indian Council of Medical Research (ICMR)-Regional Medical Research Centre

DOI: https://doi.org/10.1038/s41598-021-86626-z
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 16

Abstract

Read online

Abstract In India, tuberculosis is an enormous public health problem. This study provides the first description of molecular diversity of the Mycobacterium tuberculosis complex (MTBC) from Sikkim, India. A total of 399 Acid Fast Bacilli sputum positive samples were cultured on Lőwenstein–Jensen media and genetic characterisation was done by spoligotyping and 24-loci MIRU-VNTR typing. Spoligotyping revealed the occurrence of 58 different spoligotypes. Beijing spoligotype was the most dominant type constituting 62.41% of the total isolates and was associated with Multiple Drug Resistance. Minimum Spanning tree analysis of 249 Beijing strains based on 24-loci MIRU-VNTR analysis identified 12 clonal complexes (Single Locus Variants). The principal component analysis was used to visualise possible grouping of MTBC isolates from Sikkim belonging to major spoligotypes using 24-MIRU VNTR profiles. Artificial intelligence-based machine learning (ML) methods such as Random Forests (RF), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used to predict dominant spoligotypes of MTBC using MIRU-VNTR data. K-fold cross-validation and validation using unseen testing data set revealed high accuracy of ANN, RF, and SVM for predicting Beijing, CAS1_Delhi, and T1 Spoligotypes (93–99%). However, prediction using the external new validation data set revealed that the RF model was more accurate than SVM and ANN.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal