Scientific Reports (Jul 2017)
Spectral clustering using Nyström approximation for the accurate identification of cancer molecular subtypes
Abstract
Abstract A major challenge in clinical cancer research is the identification of accurate molecular subtype. While unsupervised clustering methods have been applied for class discovery, this clustering method remains a bottleneck in developing accurate method for molecular subtype discovery. In this analysis, we hypothesize that spectral clustering method could identify molecular subtypes in correlation with survival outcomes. We propose an accurate subtype identification method, Cancer Subtype Identification with Spectral Clustering using Nyström approximation (CSISCN), for the discovery of molecular subtypes, based on spectral clustering method. CSISCN could be used to improve gene expression-based identification of breast cancer molecular subtypes. We demonstrated that CSISCN identified the molecular subtypes with distinct clinical outcomes and was valid for the number of molecular subtypes. Furthermore, CSISCN identified molecular subtypes for improving clinical and molecular relevance which significantly outperformed consensus clustering and spectral clustering methods. To test the general applicability of the CSISCN, we further applied it on human CRC datasets and AML datasets and demonstrated superior performance as compared to consensus clustering method. In summary, CSISCN demonstrated the great potential in gene expression-based subtype identification.