BMC Bioinformatics (May 2018)
polyClustR: defining communities of reconciled cancer subtypes with biological and prognostic significance
Abstract
Abstract Background To ensure cancer patients are stratified towards treatments that are optimally beneficial, it is a priority to define robust molecular subtypes using clustering methods applied to high-dimensional biological data. If each of these methods produces different numbers of clusters for the same data, it is difficult to achieve an optimal solution. Here, we introduce “polyClustR”, a tool that reconciles clusters identified by different methods into subtype “communities” using a hypergeometric test or a measure of relative proportion of common samples. Results The polyClustR pipeline was initially tested using a breast cancer dataset to demonstrate how results are compatible with and add to the understanding of this well-characterised cancer. Two uveal melanoma datasets were then utilised to identify and validate novel subtype communities with significant metastasis-free prognostic differences and associations with known chromosomal aberrations. Conclusion We demonstrate the value of the polyClustR approach of applying multiple consensus clustering algorithms and systematically reconciling the results in identifying novel subtype communities of two cancer types, which nevertheless are compatible with established understanding of these diseases. An R implementation of the pipeline is available at: https://github.com/syspremed/polyClustR
Keywords