BMC Genomics (Feb 2022)

HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly

  • Sheina B. Sim,
  • Renee L. Corpuz,
  • Tyler J. Simmonds,
  • Scott M. Geib

DOI
https://doi.org/10.1186/s12864-022-08375-1
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Background Pacific Biosciences HiFi read technology is currently the industry standard for high accuracy long-read sequencing that has been widely adopted by large sequencing and assembly initiatives for generation of de novo assemblies in non-model organisms. Though adapter contamination filtering is routine in traditional short-read analysis pipelines, it has not been widely adopted for HiFi workflows. Results Analysis of 55 publicly available HiFi datasets revealed that a read-sanitation step to remove sequence artifacts derived from PacBio library preparation from read pools is necessary as adapter sequences can be erroneously integrated into assemblies. Conclusions Here we describe the nature of adapter contaminated reads, their consequences in assembly, and present HiFiAdapterFilt, a simple and memory efficient solution for removing adapter contaminated reads prior to assembly.

Keywords