mSphere (Feb 2019)

Unsupervised Learning Approach for Comparing Multiple Transposon Insertion Sequencing Studies

  • Troy P. Hubbard,
  • Jonathan D. D’Gama,
  • Gabriel Billings,
  • Brigid M. Davis,
  • Matthew K. Waldor

DOI
https://doi.org/10.1128/mSphere.00031-19
Journal volume & issue
Vol. 4, no. 1

Abstract

Read online

ABSTRACT Transposon insertion sequencing (TIS) is a widely used technique for conducting genome-scale forward genetic screens in bacteria. However, few methods enable comparison of TIS data across multiple replicates of a screen or across independent screens, including screens performed in different organisms. Here, we introduce a post hoc analytic framework, comparative TIS (CompTIS), which utilizes unsupervised learning to enable meta-analysis of multiple TIS data sets. CompTIS first implements screen-level principal-component analysis (PCA) and clustering to identify variation between the TIS screens. This initial screen-level analysis facilitates the selection of related screens for additional analyses, reveals the relatedness of complex environments based on growth phenotypes measured by TIS, and provides a useful quality control step. Subsequently, PCA is performed on genes to identify loci whose corresponding mutants lead to concordant/discordant phenotypes across all or in a subset of screens. We used CompTIS to analyze published intestinal colonization TIS data sets from two vibrio species. Gene-level analyses identified both pan-vibrio genes required for intestinal colonization and conserved genes that displayed species-specific requirements. CompTIS is applicable to virtually any combination of TIS screens and can be implemented without regard to either the number of screens or the methods used for upstream data analysis. IMPORTANCE Forward genetic screens are powerful tools for functional genomics. The comparison of similar forward genetic screens performed in different organisms enables the identification of genes with similar or different phenotypes across organisms. Transposon insertion sequencing is a widely used method for conducting genome-scale forward genetic screens in bacteria, yet few bioinformatic approaches have been developed to compare the results of screen replicates and different screens conducted across species or strains. Here, we used principal-component analysis (PCA) and hierarchical clustering, two unsupervised learning approaches, to analyze the relatedness of multiple in vivo screens of pathogenic vibrios. This analytic framework reveals both shared pan-vibrio requirements for intestinal colonization and strain-specific dependencies. Our findings suggest that PCA-based analytics will be a straightforward widely applicable approach for comparing diverse transposon insertion sequencing screens.

Keywords