GigaByte (Oct 2021)

Atria: an ultra-fast and accurate trimmer for adapter and quality trimming

  • Jiacheng Chuan ,
  • Aiguo Zhou ,
  • Lawrence Richard Hale ,
  • Miao He ,
  • Xiang Li

DOI
https://doi.org/10.46471/gigabyte.31

Abstract

Read online

With advances in next-generation sequencing, adapters attached to reads and low-quality bases directly and implicitly hinder downstream analysis. For example, they can produce false-positive single nucleotide polymorphisms (SNP), and generate fragmented assemblies. There is a need for a fast trimming algorithm to remove adapters precisely, especially in read tails with relatively low quality. Here, we present Atria, a trimming program that matches the adapters in paired reads and finds possible overlapped regions using a fast and carefully designed byte-based matching algorithm (O (n) time with O (1) space). Atria also implements multi-threading in both sequence processing and file compression and supports single-end reads. Compared with other trimmers, Atria performs favorably in various trimming and runtime benchmarks of both simulated and real data. We also provide a fast and lightweight byte-based matching algorithm, which can be used in various short-sequence matching applications, such as primer search and seed scanning before alignment.