Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

Raihanul  Bari Tanvir; Tasmia Aqila; Mona Maharjan; Abdullah  Al Mamun; Ananda  Mohan Mondal

doi:10.3390/data4020081

Data (Jun 2019)

Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

Raihanul Bari Tanvir,
Tasmia Aqila,
Mona Maharjan,
Abdullah Al Mamun,
Ananda Mohan Mondal

Affiliations

Raihanul Bari Tanvir: School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
Tasmia Aqila: School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
Mona Maharjan: School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
Abdullah Al Mamun: School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
Ananda Mohan Mondal: School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA

DOI: https://doi.org/10.3390/data4020081
Journal volume & issue: Vol. 4, no. 2
p. 81

Abstract

Read online

Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer.

Published in Data

ISSN: 2306-5729 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Bibliography. Library science. Information resources
Website: http://www.mdpi.com/journal/data

About the journal

Abstract

Keywords