PeerJ (Jun 2019)

Characterization of bidirectional gene pairs in The Cancer Genome Atlas (TCGA) dataset

  • Juchuanli Tu,
  • Xiaolu Li,
  • Jianjun Wang

DOI
https://doi.org/10.7717/peerj.7107
Journal volume & issue
Vol. 7
p. e7107

Abstract

Read online Read online

The “bidirectional gene pair” indicates a particular head-to-head gene organization in which transcription start sites of two genes are located on opposite strands of genomic DNA within a region of one kb. Despite bidirectional gene pairs are well characterized, little is known about their expression profiles and regulation features in tumorigenesis. We used RNA-seq data from The Cancer Genome Atlas (TCGA) dataset for a systematic analysis of the expression profiles of bidirectional gene pairs in 13 cancer datasets. Gene pairs on the opposite strand with transcription end site distance within one kb or on the same strand with the distance of two genes between 1–10 kb and gene pairs comprising two randomly chosen genes were used as control gene pairs (CG1, CG2, and random). We identified and characterized up-/down-regulated genes by comparing the expression level between tumors and adjacent normal tissues in 13 TCGA datasets. There were no consistently significant difference in the percentage of up-/down-regulated genes between bidirectional and control/random genes in most of TCGA datasets. However, the percentage of bidirectional gene pairs comprising two up- or two down-regulated genes was significantly higher than gene pairs from CG1/2 in 12/11 analyzed TCGA datasets and the random gene pairs in all 13 TCGA datasets. Then we identified the methylation correlated bidirectional genes to explore the regulatory mechanism of bidirectional genes. Like the differentially expressed gene pairs, the bidirectional genes in a pair were significantly prone to be both hypo- or hyper-methylation correlated genes in 12/13 TCGA datasets when comparing to the CG2/random gene pairs despite no significant difference between the percentages of hypo-/hyper-methylation correlated genes in bidirectional and CG2/random genes in most of TCGA datasets. Finally, we explored the correlation between bidirectional genes and patient’s survival, identifying prognostic bidirectional genes and prognostic bidirectional gene pairs in each TCGA dataset. Remarkably, we found a group of prognostic bidirectional gene pairs in which the combination of two protein coding genes with different expression level correlated with different survival prognosis in survival analysis for OS. The percentage of these gene pairs in bidirectional gene pair were significantly higher than the gene pairs in controls in COAD datasets and lower in none of 13 TCGA datasets.

Keywords