Genome Biology (Dec 2020)

STARRPeaker: uniform processing and accurate identification of STARR-seq active regions

  • Donghoon Lee,
  • Manman Shi,
  • Jennifer Moran,
  • Martha Wall,
  • Jing Zhang,
  • Jason Liu,
  • Dominic Fitzgerald,
  • Yasuhiro Kyono,
  • Lijia Ma,
  • Kevin P. White,
  • Mark Gerstein

DOI
https://doi.org/10.1186/s13059-020-02194-x
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 24

Abstract

Read online

Abstract STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.