STARRPeaker: uniform processing and accurate identification of STARR-seq active regions

Donghoon Lee; Manman Shi; Jennifer Moran; Martha Wall; Jing Zhang; Jason Liu; Dominic Fitzgerald; Yasuhiro Kyono; Lijia Ma; Kevin P. White; Mark Gerstein

doi:10.1186/s13059-020-02194-x

Genome Biology (Dec 2020)

STARRPeaker: uniform processing and accurate identification of STARR-seq active regions

Donghoon Lee,
Manman Shi,
Jennifer Moran,
Martha Wall,
Jing Zhang,
Jason Liu,
Dominic Fitzgerald,
Yasuhiro Kyono,
Lijia Ma,
Kevin P. White,
Mark Gerstein

Affiliations

Donghoon Lee: Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai
Manman Shi: Institute for Genomics and System Biology, University of Chicago
Jennifer Moran: Institute for Genomics and System Biology, University of Chicago
Martha Wall: Institute for Genomics and System Biology, University of Chicago
Jing Zhang: School of Information and Computer Sciences, University of California
Jason Liu: Program in Computational Biology and Bioinformatics, Yale University
Dominic Fitzgerald: Institute for Genomics and System Biology, University of Chicago
Yasuhiro Kyono: Institute for Genomics and System Biology, University of Chicago
Lijia Ma: Institute for Genomics and System Biology, University of Chicago
Kevin P. White: Institute for Genomics and System Biology, University of Chicago
Mark Gerstein: Program in Computational Biology and Bioinformatics, Yale University

DOI: https://doi.org/10.1186/s13059-020-02194-x
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 24

Abstract

Read online

Abstract STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal