BMC Bioinformatics (May 2021)

Accel-Align: a fast sequence mapper and aligner based on the seed–embed–extend method

  • Yiqing Yan,
  • Nimisha Chaturvedi,
  • Raja Appuswamy

DOI
https://doi.org/10.1186/s12859-021-04162-z
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Background Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed–filter–extend methodology and rely on filtration heuristics to reduce the overhead of edit distance computation. However, filtering has inherent performance–accuracy trade-offs that limits its effectiveness. Results Motivated by algorithmic advances in randomized low-distortion embedding, we introduce SEE, a new methodology for developing sequence mappers and aligners. While SFE focuses on eliminating sub-optimal candidates, SEE focuses instead on identifying optimal candidates. To do so, SEE transforms the read and reference strings from edit distance regime to the Hamming regime by embedding them using a randomized algorithm, and uses Hamming distance over the embedded set to identify optimal candidates. To show that SEE performs well in practice, we present Accel-Align an SEE-based short-read sequence mapper and aligner that is 3–12 $$\times$$ × faster than state-of-the-art aligners on commodity CPUs, without any special-purpose hardware, while providing comparable accuracy. Conclusions As sequencing technologies continue to increase read length while improving throughput and accuracy, we believe that randomized embeddings open up new avenues for optimization that cannot be achieved by using edit distance. Thus, the techniques presented in this paper have a much broader scope as they can be used for other applications like graph alignment, multiple sequence alignment, and sequence assembly.

Keywords