Correlating gene expression levels with transcription factor binding sites facilitates identification of key transcription factors from transcriptome data

Tinghua Huang; Siqi Niu; Fanghong Zhang; Binyu Wang; Jianwu Wang; Guoping Liu; Min Yao

doi:10.3389/fgene.2024.1511456

Frontiers in Genetics (Nov 2024)

Correlating gene expression levels with transcription factor binding sites facilitates identification of key transcription factors from transcriptome data

Tinghua Huang,
Siqi Niu,
Fanghong Zhang,
Binyu Wang,
Jianwu Wang,
Guoping Liu,
Min Yao

Affiliations

Tinghua Huang: College of Animal Science and Technology, Yangtze University, Jingzhou, China
Siqi Niu: College of Animal Science and Technology, Yangtze University, Jingzhou, China
Fanghong Zhang: College of Animal Science and Technology, Yangtze University, Jingzhou, China
Binyu Wang: College of Animal Science and Technology, Yangtze University, Jingzhou, China
Jianwu Wang: College of Agriculture, Yangtze University, Jingzhou, China
Guoping Liu: College of Animal Science and Technology, Yangtze University, Jingzhou, China
Min Yao: College of Animal Science and Technology, Yangtze University, Jingzhou, China

DOI: https://doi.org/10.3389/fgene.2024.1511456
Journal volume & issue: Vol. 15

Abstract

Read online

Identification of key transcription factors from transcriptome data by correlating gene expression levels with transcription factor binding sites is important for transcriptome data analysis. In a typical scenario, we always set a threshold to filter the top ranked differentially expressed genes and top ranked transcription factor binding sites. However, correlation analysis of filtered data can often result in spurious correlations. In this study, we tested four methods for creating the gene expression inputs (ranked gene list) in the correlation analysis: star coordinate map transformation (START), expression differential score (ED), preferential expression measure (PEM), and the specificity measure (SPM). Then, Kendall’s tau correlation statistical algorithms implementing the standard (STD), LINEAR, MIX-LINEAR, DENSITY-CURVE, and MIXED-DENSITY-CURVE weighting methods were used to identify key transcription factors. ED was identified as the optimal method for creating a ranked gene list from filtered expression data, which can address the “unable to detect negative correlation” fallacy presented by other methods. The MIXED-DENSITY-CURVE was the most sensitive for identifying transcription factors from the gene set and list in which only the top proportion was correlated. Ultimately, 644 transcription factor candidates were identified from the transcriptome data of 1,206 cell lines, six of which were validated by wet lab experiments. The Jinzer and Flaver software implementing these methods can be obtained from http://www.thua45/cn/flaver under a free academic license.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords