PeerJ (Mar 2017)

From big data to diagnosis and prognosis: gene expression signatures in liver hepatocellular carcinoma

  • Hong Yang,
  • Xin Zhang,
  • Xiao-yong Cai,
  • Dong-yue Wen,
  • Zhi-hua Ye,
  • Liang Liang,
  • Lu Zhang,
  • Han-lin Wang,
  • Gang Chen,
  • Zhen-bo Feng

DOI
https://doi.org/10.7717/peerj.3089
Journal volume & issue
Vol. 5
p. e3089

Abstract

Read online Read online

Background Liver hepatocellular carcinoma accounts for the overwhelming majority of primary liver cancers and its belated diagnosis and poor prognosis call for novel biomarkers to be discovered, which, in the era of big data, innovative bioinformatics and computational techniques can prove to be highly helpful in. Methods Big data aggregated from The Cancer Genome Atlas and Natural Language Processing were integrated to generate differentially expressed genes. Relevant signaling pathways of differentially expressed genes went through Gene Ontology enrichment analysis, Kyoto Encyclopedia of Genes and Genomes and Panther pathway enrichment analysis and protein-protein interaction network. The pathway ranked high in the enrichment analysis was further investigated, and selected genes with top priority were evaluated and assessed in terms of their diagnostic and prognostic values. Results A list of 389 genes was generated by overlapping genes from The Cancer Genome Atlas and Natural Language Processing. Three pathways demonstrated top priorities, and the one with specific associations with cancers, ‘pathways in cancer,’ was analyzed with its four highlighted genes, namely, BIRC5, E2F1, CCNE1, and CDKN2A, which were validated using Oncomine. The detection pool composed of the four genes presented satisfactory diagnostic power with an outstanding integrated AUC of 0.990 (95% CI [0.982–0.998], P < 0.001, sensitivity: 96.0%, specificity: 96.5%). BIRC5 (P = 0.021) and CCNE1 (P = 0.027) were associated with poor prognosis, while CDKN2A (P = 0.066) and E2F1 (P = 0.088) demonstrated no statistically significant differences. Discussion The study illustrates liver hepatocellular carcinoma gene signatures, related pathways and networks from the perspective of big data, featuring the cancer-specific pathway with priority, ‘pathways in cancer.’ The detection pool of the four highlighted genes, namely BIRC5, E2F1, CCNE1 and CDKN2A, should be further investigated given its high evidence level of diagnosis, whereas the prognostic powers of BIRC5 and CCNE1 are equally attractive and worthy of attention.

Keywords