Entropy (Aug 2019)

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

  • Chun-xiao Sun,
  • Yu Yang,
  • Hua Wang,
  • Wen-hu Wang

DOI
https://doi.org/10.3390/e21080802
Journal volume & issue
Vol. 21, no. 8
p. 802

Abstract

Read online

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

Keywords