Genome Biology (Jan 2021)

BlastFrost: fast querying of 100,000s of bacterial genomes in Bifrost graphs

  • Nina Luhmann,
  • Guillaume Holley,
  • Mark Achtman

DOI
https://doi.org/10.1186/s13059-020-02237-3
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract BlastFrost is a highly efficient method for querying 100,000s of genome assemblies, building on Bifrost, a dynamic data structure for compacted and colored de Bruijn graphs. BlastFrost queries a Bifrost data structure for sequences of interest and extracts local subgraphs, enabling the identification of the presence or absence of individual genes or single nucleotide sequence variants. We show two examples using Salmonella genomes: finding within minutes the presence of genes in the SPI-2 pathogenicity island in a collection of 926 genomes and identifying single nucleotide polymorphisms associated with fluoroquinolone resistance in three genes among 190,209 genomes. BlastFrost is available at https://github.com/nluhmann/BlastFrost/tree/master/data .