BMC Bioinformatics (Dec 2018)
SMARTcleaner: identify and clean off-target signals in SMART ChIP-seq analysis
Abstract
Abstract Background Noises and artifacts may arise in several steps of the next-generation sequencing (NGS) process. Recently, an NGS library preparation method called SMART, or Switching Mechanism At the 5′ end of the RNA Transcript, is introduced to prepare ChIP-seq (chromatin immunoprecipitation and deep sequencing) libraries from small amount of DNA material, using the DNA SMART ChIP-seq Kit. The protocol adds Ts to the 3′ end of DNA templates, which is subsequently recognized and used by SMART poly(dA) primers for reverse transcription and then addition of PCR primers and sequencing adapters. The poly(dA) primers, however, can anneal to poly(T) sequences in a genome and amplify DNA fragments that are not enriched in the immunoprecipitated DNA templates. This off-target amplification results in false signals in the ChIP-seq data. Results Here, we show that the off-target ChIP-seq reads derived from false amplification of poly(T/A) genomic sequences have unique and strand-specific features. Accordingly, we develop a tool (called “SMARTcleaner”) that can exploit these features to remove SMART ChIP-seq artifacts. Application of SMARTcleaner to several SMART ChIP-seq datasets demonstrates that it can remove reads from off-target amplification effectively, leading to significantly improved ChIP-seq peaks and results. Conclusions SMARTcleaner could identify and clean the false signals in SMART-based ChIP-seq libraries, leading to improvement in peak calling, and downstream data analysis and interpretation.
Keywords