mBio (Sep 2024)

Holistic understanding of trimethoprim resistance in Streptococcus pneumoniae using an integrative approach of genome-wide association study, resistance reconstruction, and machine learning

  • Nguyen-Phuong Pham,
  • Hélène Gingras,
  • Chantal Godin,
  • Jie Feng,
  • Alexis Groppi,
  • Macha Nikolski,
  • Philippe Leprohon,
  • Marc Ouellette

DOI
https://doi.org/10.1128/mbio.01360-24
Journal volume & issue
Vol. 15, no. 9

Abstract

Read online

ABSTRACT Antimicrobial resistance (AMR) is a public health threat worldwide. Next-generation sequencing (NGS) has opened unprecedented opportunities to accelerate AMR mechanism discovery and diagnostics. Here, we present an integrative approach to investigate trimethoprim (TMP) resistance in the key pathogen Streptococcus pneumoniae. We explored a collection of 662 S. pneumoniae genomes by conducting a genome-wide association study (GWAS), followed by functional validation using resistance reconstruction experiments, combined with machine learning (ML) approaches to predict TMP minimum inhibitory concentration (MIC). Our study showed that multiple additive mutations in the folA and sulA loci are responsible for TMP non-susceptibility in S. pneumoniae and can be used as key features to build ML models for digital MIC prediction, reaching an average accuracy within ±1 twofold dilution factor of 86.3%. Our roadmap of in silico analysis—wet-lab validation—diagnostic tool building could be adapted to explore AMR in other combinations of bacteria–antibiotic.IMPORTANCEIn the age of next-generation sequencing (NGS), while data-driven methods such as genome-wide association study (GWAS) and machine learning (ML) excel at finding patterns, functional validation can be challenging due to the high numbers of candidate variants. We designed an integrative approach combining a GWAS on S. pneumoniae clinical isolates, followed by whole-genome transformation coupled with NGS to functionally characterize a large set of GWAS candidates. Our study validated several phenotypic folA mutations beyond the standard Ile100Leu mutation, and showed that the overexpression of the sulA locus produces trimethoprim (TMP) resistance in Streptococcus pneumoniae. These validated loci, when used to build ML models, were found to be the best inputs for predicting TMP minimal inhibitory concentrations. Integrative approaches can bridge the genotype-phenotype gap by biological insights that can be incorporated in ML models for accurate prediction of drug susceptibility.

Keywords