iScience (Sep 2024)

PanKA: Leveraging population pangenome to predict antibiotic resistance

  • Van Hoan Do,
  • Van Sang Nguyen,
  • Son Hoang Nguyen,
  • Duc Quang Le,
  • Tam Thi Nguyen,
  • Canh Hao Nguyen,
  • Tho Huu Ho,
  • Nam S. Vo,
  • Trang Nguyen,
  • Hoang Anh Nguyen,
  • Minh Duc Cao

Journal volume & issue
Vol. 27, no. 9
p. 110623

Abstract

Read online

Summary: Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.

Keywords