Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

Dongjun Chung; Pei Fen Kuan; Bo Li; Rajendran Sanalkumar; Kun Liang; Emery H Bresnick; Colin Dewey; Sündüz Keleş

doi:10.1371/journal.pcbi.1002111

PLoS Computational Biology (Jul 2011)

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

Dongjun Chung,
Pei Fen Kuan,
Bo Li,
Rajendran Sanalkumar,
Kun Liang,
Emery H Bresnick,
Colin Dewey,
Sündüz Keleş

Affiliations

Dongjun Chung
Pei Fen Kuan
Bo Li
Rajendran Sanalkumar
Kun Liang
Emery H Bresnick
Colin Dewey
Sündüz Keleş

DOI: https://doi.org/10.1371/journal.pcbi.1002111
Journal volume & issue: Vol. 7, no. 7
p. e1002111

Abstract

Read online

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

Published in PLoS Computational Biology

ISSN: 1553-734X (Print); 1553-7358 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Science: Biology (General)
Website: https://journals.plos.org/ploscompbiol/

About the journal