PLoS ONE (Sep 2010)
A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas.
Abstract
Cancer subtype classification and survival prediction both relate directly to patients' specific treatment plans, making them fundamental medical issues. Although the two factors are interrelated learning problems, most studies tackle each separately. In this paper, expression levels of genes are used for both cancer subtype classification and survival prediction. We considered 350 diffuse large B-cell lymphoma (DLBCL) subjects, taken from four groups of patients (activated B-cell-like subtype dead, activated B-cell-like subtype alive, germinal center B-cell-like subtype dead, and germinal center B-cell-like subtype alive). As classification features, we used 11,271 gene expression levels of each subject. The features were first ranked by mRMR (Maximum Relevance Minimum Redundancy) principle and further selected by IFS (Incremental Feature Selection) procedure. Thirty-five gene signatures were selected after the IFS procedure, and the patients were divided into the above mentioned four groups. These four groups were combined in different ways for subtype prediction and survival prediction, specifically, the activated versus the germinal center and the alive versus the dead. Subtype prediction accuracy of the 35-gene signature was 98.6%. We calculated cumulative survival time of high-risk group and low-risk groups by the Kaplan-Meier method. The log-rank test p-value was 5.98e-08. Our methodology provides a way to study subtype classification and survival prediction simultaneously. Our results suggest that for some diseases, especially cancer, subtype classification may be used to predict survival, and, conversely, survival prediction features may shed light on subtype features.