mSystems (Aug 2022)

Forecasting Staphylococcus aureus Infections Using Genome-Wide Association Studies, Machine Learning, and Transcriptomic Approaches

  • Mohamed Sassi,
  • Julie Bronsard,
  • Gaetan Pascreau,
  • Mathieu Emily,
  • Pierre-Yves Donnio,
  • Matthieu Revest,
  • Brice Felden,
  • Thierry Wirth,
  • Yoann Augagneur

DOI
https://doi.org/10.1128/msystems.00378-22
Journal volume & issue
Vol. 7, no. 4

Abstract

Read online

ABSTRACT Staphylococcus aureus is a major human and animal pathogen, colonizing diverse ecological niches within its hosts. Predicting whether an isolate will infect a specific host and its subsequent clinical fate remains unknown. In this study, we investigated the S. aureus pangenome using a curated set of 356 strains, spanning a wide range of hosts, origins, and clinical display and antibiotic resistance profiles. We used genome-wide association study (GWAS) and random forest (RF) algorithms to discriminate strains based on their origins and clinical sources. Here, we show that the presence of sak and scn can discriminate strains based on their host specificity, while other genes such as mecA are often associated with virulent outcomes. Both GWAS and RF indicated the importance of intergenic regions (IGRs) and coding DNA sequence (CDS) but not sRNAs in forecasting an outcome. Additional transcriptomic analyses performed on the most prevalent clonal complex 8 (CC8) clonal types, in media mimicking nasal colonization or bacteremia, indicated three RNAs as potential RNA markers to forecast infection, followed by 30 others that could serve as infection severity predictors. Our report shows that genetic association and transcriptomics are complementary approaches that will be combined in a single analytical framework to improve our understanding of bacterial pathogenesis and ultimately identify potential predictive molecular markers. IMPORTANCE Predicting the outcome of bacterial colonization and infections, based on extensive genomic and transcriptomic data from a given pathogen, would be of substantial help for clinicians in treating and curing patients. In this report, genome-wide association studies and random forest algorithms have defined gene combinations that differentiate human from animal strains, colonization from diseases, and nonsevere from severe diseases, while it revealed the importance of IGRs and CDS, but not small RNAs (sRNAs), in anticipating an outcome. In addition, transcriptomic analyses performed on the most prevalent clonal types, in media mimicking either nasal colonization or bacteremia, revealed significant differences and therefore potent RNA markers. Overall, the use of both genomic and transcriptomic data in a single analytical framework can enhance our understanding of bacterial pathogenesis.

Keywords