Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types

Paloma Lara; Socorro Gama-Castro; Heladia Salgado; Claire Rioualen; Víctor H. Tierrafría; Víctor H. Tierrafría; Luis J. Muñiz-Rascado; César Bonavides-Martínez; Julio Collado-Vides; Julio Collado-Vides; Julio Collado-Vides

doi:10.3389/fgene.2024.1353553

Frontiers in Genetics (Mar 2024)

Flexible gold standards for transcription factor regulatory interactions in Escherichia coli K-12: architecture of evidence types

Paloma Lara,
Socorro Gama-Castro,
Heladia Salgado,
Claire Rioualen,
Víctor H. Tierrafría,
Víctor H. Tierrafría,
Luis J. Muñiz-Rascado,
César Bonavides-Martínez,
Julio Collado-Vides,
Julio Collado-Vides,
Julio Collado-Vides

Affiliations

Paloma Lara: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Socorro Gama-Castro: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Heladia Salgado: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Claire Rioualen: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Víctor H. Tierrafría: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Víctor H. Tierrafría: Department of Biomedical Engineering, Boston University, Boston, MA, United States
Luis J. Muñiz-Rascado: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
César Bonavides-Martínez: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Julio Collado-Vides: Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad S/N, Cuernavaca, Mexico
Julio Collado-Vides: Department of Biomedical Engineering, Boston University, Boston, MA, United States
Julio Collado-Vides: Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Universitat Pompeu Fabra, Barcelona, Spain

DOI: https://doi.org/10.3389/fgene.2024.1353553
Journal volume & issue: Vol. 15

Abstract

Read online

Post-genomic implementations have expanded the experimental strategies to identify elements involved in the regulation of transcription initiation. Here, we present for the first time a detailed analysis of the sources of knowledge supporting the collection of transcriptional regulatory interactions (RIs) of Escherichia coli K-12. An RI groups the transcription factor, its effect (positive or negative) and the regulated target, a promoter, a gene or transcription unit. We improved the evidence codes so that specific methods are incorporated and classified into independent groups. On this basis we updated the computation of confidence levels, weak, strong, or confirmed, for the collection of RIs. These updates enabled us to map the RI set to the current collection of HT TF-binding datasets from ChIP-seq, ChIP-exo, gSELEX and DAP-seq in RegulonDB, enriching in this way the evidence of close to one-quarter (1329) of RIs from the current total 5446 RIs. Based on the new computational capabilities of our improved annotation of evidence sources, we can now analyze the internal architecture of evidence, their categories (experimental, classical, HT, computational), and confidence levels. This is how we know that the joint contribution of HT and computational methods increase the overall fraction of reliable RIs (the sum of confirmed and strong evidence) from 49% to 71%. Thus, the current collection has 3912 reliable RIs, with 2718 or 70% of them with classical evidence which can be used to benchmark novel HT methods. Users can selectively exclude the method they want to benchmark, or keep for instance only the confirmed interactions. The recovery of regulatory sites in RegulonDB by the different HT methods ranges between 33% by ChIP-exo to 76% by ChIP-seq although as discussed, many potential confounding factors limit their interpretation. The collection of improvements reported here provides a solid foundation to incorporate new methods and data, and to further integrate the diverse sources of knowledge of the different components of the transcriptional regulatory network. There is no other genomic database that offers this comprehensive high-quality architecture of knowledge supporting a corpus of transcriptional regulatory interactions.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords