Cancer Informatics (Jul 2021)
Pan-Cancer Survival Classification With Clinicopathological and Targeted Gene Expression Features
Abstract
Prognostication for patients with cancer is important for clinical planning and management, but remains challenging given the large number of factors that can influence outcomes. As such, there is a need to identify features that can robustly predict patient outcomes. We evaluated 8608 patient tumor samples across 16 cancer types from The Cancer Genome Atlas and generated distinct survival classifiers for each using clinical and histopathological data accessible to standard oncology workflows. For cancers that had poor model performance, we deployed a random-forest-embedded sequential forward selection approach that began with an initial subset of the 15 most predictive clinicopathological features before sequentially appending the next most informative gene as an additional feature. With classifiers derived from clinical and histopathological features alone, we observed cancer-type-dependent model performance and an area under the receiver operating curve (AUROC) range of 0.65 to 0.91 across all 16 cancer types for 1- and 3-year survival prediction, with some classifiers consistently outperforming those for others. As such, for cancers that had poor model performance, we posited that the addition of more complex biomolecular features could enhance our ability to prognose patients where clinicopathological features were insufficient. With the inclusion of gene expression data, model performance for 3 select cancers (glioblastoma, stomach/gastric adenocarcinoma, ovarian serous carcinoma) markedly increased from initial AUROC scores of 0.66, 0.69, and 0.67 to 0.76, 0.77, and 0.77, respectively. As a whole, this study provides a thorough examination of the relative contributions of clinical, pathological, and gene expression data in predicting overall survival and reveals cancer types for which clinical features are already strong predictors and those where additional biomolecular information is needed.