Mycology (Jun 2024)

An alignment- and reference-free strategy using k-mer present pattern for population genomic analyses

  • Guohui Shi,
  • Yi Dai,
  • Da Zhou,
  • Mengmeng Chen,
  • Jiaqi Zhang,
  • Yilong Bi,
  • Shuai Liu,
  • Qi Wu

DOI
https://doi.org/10.1080/21501203.2024.2358868

Abstract

Read online

Pangenomes are replacing single reference genomes to capture all variants within a species or clade, but their analysis predominantly leverages graph-based methods that require multiple high-quality genomes and computationally intensive multiple-genome alignments. K-mer decomposition is an alternative to graph-based pangenomes. However, how to directly use k-mers for the population genetic analyses is unknown. Here, we developed a novel strategy that uses the variants of k-mer count in the genome for population analyses. To test the effectivity of this method, we compared it directly to the SNP-based method on the analysis of population structure and genetic diversity of 267 Saccharomyces cerevisiae strains within two simulated datasets and a real sequence dataset. The population structure identified with k-mers recapitulates that obtained using SNPs, indicating the effectiveness of k-mer-based approach, and higher genetic diversity within real dataset supported k-mers contained more genetic variants. Based on k-mer frequency, we found not only SNP but also some insertion/deletion and horizontal gene transfer (HGT) fragments related to the adaptive evolution of S. cerevisiae. Our study creates a framework for the alignment- and reference-free (ARF) method in population genetic analyses, which will be more pronounced in the species with no complete genome or highly diverged species.

Keywords