Inferred regulons are consistent with regulator binding sequences in E. coli.

Sizhe Qiu; Xinlong Wan; Yueshan Liang; Cameron R Lamoureux; Amir Akbari; Bernhard O Palsson; Daniel C Zielinski

doi:10.1371/journal.pcbi.1011824

PLoS Computational Biology (Jan 2024)

Inferred regulons are consistent with regulator binding sequences in E. coli.

Sizhe Qiu,
Xinlong Wan,
Yueshan Liang,
Cameron R Lamoureux,
Amir Akbari,
Bernhard O Palsson,
Daniel C Zielinski

Affiliations

Sizhe Qiu
Xinlong Wan
Yueshan Liang
Cameron R Lamoureux
Amir Akbari
Bernhard O Palsson
Daniel C Zielinski

DOI: https://doi.org/10.1371/journal.pcbi.1011824
Journal volume & issue: Vol. 20, no. 1
p. e1011824

Abstract

Read online

The transcriptional regulatory network (TRN) of E. coli consists of thousands of interactions between regulators and DNA sequences. Regulons are typically determined either from resource-intensive experimental measurement of functional binding sites, or inferred from analysis of high-throughput gene expression datasets. Recently, independent component analysis (ICA) of RNA-seq compendia has shown to be a powerful method for inferring bacterial regulons. However, it remains unclear to what extent regulons predicted by ICA structure have a biochemical basis in promoter sequences. Here, we address this question by developing machine learning models that predict inferred regulon structures in E. coli based on promoter sequence features. Models were constructed successfully (cross-validation AUROC > = 0.8) for 85% (40/47) of ICA-inferred E. coli regulons. We found that: 1) The presence of a high scoring regulator motif in the promoter region was sufficient to specify regulatory activity in 40% (19/47) of the regulons, 2) Additional features, such as DNA shape and extended motifs that can account for regulator multimeric binding, helped to specify regulon structure for the remaining 60% of regulons (28/47); 3) investigating regulons where initial machine learning models failed revealed new regulator-specific sequence features that improved model accuracy. Finally, we found that strong regulatory binding sequences underlie both the genes shared between ICA-inferred and experimental regulons as well as genes in the E. coli core pan-regulon of Fur. This work demonstrates that the structure of ICA-inferred regulons largely can be understood through the strength of regulator binding sites in promoter regions, reinforcing the utility of top-down inference for regulon discovery.

Published in PLoS Computational Biology

ISSN: 1553-734X (Print); 1553-7358 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Science: Biology (General)
Website: https://journals.plos.org/ploscompbiol/

About the journal