PLoS ONE (Jan 2018)

Modelling of the breadth of expression from promoter architectures identifies pro-housekeeping transcription factors.

  • Lukasz Huminiecki

DOI
https://doi.org/10.1371/journal.pone.0198961
Journal volume & issue
Vol. 13, no. 6
p. e0198961

Abstract

Read online

Understanding how regulatory elements control mammalian gene expression is a challenge of post-genomic era. We previously reported that size of proximal promoter architecture predicted the breadth of expression (fraction of tissues in which a gene is expressed). Herein, the contributions of individual transcription factors (TFs) were quantified. Several technologies of statistical modelling were utilized and compared: tree models, generalized linear models (GLMs, without and with regularization), Bayesian GLMs and random forest. Both linear and non-linear modelling strategies were explored. Encouragingly, different models led to similar statistical conclusions and biological interpretations. The majority of ENCODE TFs correlated positively with housekeeping expression, a minority correlated negatively. Thus, housekeeping expression can be understood as a cumulative effect of many types of TF binding sites. This is accompanied by the exclusion of fewer types of binding sites for TFs which are repressors, or support cell lineage commitment or temporarily inducible or spatially-restricted expression.