Вавиловский журнал генетики и селекции (Feb 2017)

Computer analysis of co-localization of transcription factor binding sites in genome by ChIP-seq data

  • A. I. Dergilev,
  • A. M. Spitsina,
  • I. V. Chadaeva,
  • A. V. Svichkarev,
  • F. M. Naumenko,
  • E. V. Kulakova,
  • E. R. Galieva,
  • E. E. Vityaev,
  • M. Chen,
  • Y. L. Orlov

DOI
https://doi.org/10.18699/VJ16.194
Journal volume & issue
Vol. 20, no. 6
pp. 770 – 778

Abstract

Read online

Statistical features of the distribution of transcription factor binding sites in the mouse genome that are obtained by ChIP-seq experiments in embryonic stem cells have been considered. Clusters of sites that contain four or more different transcription factor binding sites in the mouse genome have been defined, also their location relatively to the regulatory regions of genes has been described. The presence of two types of site co-localization has been shown: clusters containing binding sites for factors Oct4, Nanog, Sox2, located in the distal regions, and clusters containing binding sites n-Myc, c-Myc, mainly located in the promoter regions of mouse genes. Analysis of new ChIPseq data about binding of transcription factors Nr5a2, Tbx3 in the same cell type has confirmed the division of clusters of transcription factors binding sites into two types: those containing the binding sites of regulators of pluripotency (Oct4, Nanog, and others) and those not. The computer program of the statistical data processing of gene location and chromatin domains that analyzes experimental data of site localization obtained by ChIP-seq in the mouse genome and the human genome has been developed. The presence of preferences at position of transcription factor binding sites of various types has been revealed, the distances between the nearest groups of TF binding sites Oct4, Nanog, Sox2 and TF binding sites n-Myc and c-Myc have been calculated using this program. The presence of nucleotide motifs of transcription factor binding sites in the selected areas of ChIP-seq has been estimated, nucleotide motifs have been refined. A correlation between the presence of motifs and the intensity of ChIPseq binding has been shown. Computer methods for estimating the clustering of different transcription factors binding sites for new data ChIP-seq have been developed. Programs are available upon the request to the authors.

Keywords