BMC Bioinformatics (Jun 2023)

Rescuing biologically relevant consensus regions across replicated samples

  • Vahid Jalili,
  • Marzia A. Cremona,
  • Fernando Palluzzi

DOI
https://doi.org/10.1186/s12859-023-05340-x
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Background Protein-DNA binding sites of ChIP-seq experiments are identified where the binding affinity is significant based on a given threshold. The choice of the threshold is a trade-off between conservative region identification and discarding weak, but true binding sites. Results We rescue weak binding sites using MSPC, which efficiently exploits replicates to lower the threshold required to identify a site while keeping a low false-positive rate, and we compare it to IDR, a widely used post-processing method for identifying highly reproducible peaks across replicates. We observe several master transcription regulators (e.g., SP1 and GATA3) and HDAC2-GATA1 regulatory networks on rescued regions in K562 cell line. Conclusions We argue the biological relevance of weak binding sites and the information they add when rescued by MSPC. An implementation of the proposed extended MSPC methodology and the scripts to reproduce the performed analysis are freely available at https://genometric.github.io/MSPC/ ; MSPC is distributed as a command-line application and an R package available from Bioconductor ( https://doi.org/doi:10.18129/B9.bioc.rmspc ).

Keywords