eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes

Zheng Chang; Zhenjia Wang; Cody Ashby; Chuan Zhou; Guojun Li; Shuzhong Zhang; Xiuzhen Huang

doi:10.4137/CIN.S13777

Cancer Informatics (Jan 2014)

eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes

Zheng Chang,
Zhenjia Wang,
Cody Ashby,
Chuan Zhou,
Guojun Li,
Shuzhong Zhang,
Xiuzhen Huang

Affiliations

Zheng Chang: School of Mathematics, Shandong University, Jinan, Shandong, China.
Zhenjia Wang: School of Mathematics, Shandong University, Jinan, Shandong, China.
Cody Ashby: Molecular Biosciences Program, Arkansas State University, Jonesboro, AR, USA.
Chuan Zhou: School of Mathematics, Shandong University, Jinan, Shandong, China.
Guojun Li: Department of Computer Science, Arkansas State University, Jonesboro, AR, USA.
Shuzhong Zhang: Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, USA.
Xiuzhen Huang: Molecular Biosciences Program, Arkansas State University, Jonesboro, AR, USA.

DOI: https://doi.org/10.4137/CIN.S13777
Journal volume & issue: Vol. 13s2

Abstract

Read online

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checkerboard patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.

Published in Cancer Informatics

ISSN: 1176-9351 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://journals.sagepub.com/home/cix

About the journal