Genome Biology (May 2023)

SPUMONI 2: improved classification using a pangenome index of minimizer digests

  • Omar Y. Ahmed,
  • Massimiliano Rossi,
  • Travis Gagie,
  • Christina Boucher,
  • Ben Langmead

DOI
https://doi.org/10.1186/s13059-023-02958-1
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2’s index is 65 times smaller than minimap2’s for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.

Keywords