BMC Genomics (May 2024)
Disregarding multimappers leads to biases in the functional assessment of NGS data
Abstract
Abstract Background Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous (“multimappers”). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored. Results In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified. Conclusion Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines –currently restricted to specific contexts or communities– to ensure the reliability of genomic and transcriptomic studies.
Keywords