Cancer Informatics (Jan 2014)

eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes

  • Zheng Chang,
  • Zhenjia Wang,
  • Cody Ashby,
  • Chuan Zhou,
  • Guojun Li,
  • Shuzhong Zhang,
  • Xiuzhen Huang

DOI
https://doi.org/10.4137/CIN.S13777
Journal volume & issue
Vol. 13s2

Abstract

Read online

Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checkerboard patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.