On Fair Performance Comparison between Random Survival Forest and Cox Regression: An Example of Colorectal Cancer Study

Sirin Cetin; Ayse Ulgen; Isa Dede; Wentian Li

doi:10.28991/SciMedJ-2021-0301-9

SciMedicine Journal (Mar 2021)

On Fair Performance Comparison between Random Survival Forest and Cox Regression: An Example of Colorectal Cancer Study

Sirin Cetin,
Ayse Ulgen,
Isa Dede,
Wentian Li

Affiliations

Sirin Cetin: Department of Biostatistics, Faculty of Medicine, Tokat GaziosmanPasa University,
Ayse Ulgen: Department of Biostatistics, Faculty of Medicine, Girne American University, Karmi,
Isa Dede: Medical Oncology, Faculty of Medicine, Mustafa Kemal University, Antakya,
Wentian Li: The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY,

DOI: https://doi.org/10.28991/SciMedJ-2021-0301-9
Journal volume & issue: Vol. 3, no. 1
pp. 66 – 76

Abstract

Read online

Random Forest (RF), a mostly model-free and robust machine learning method, has been successfully applied to right-censored survival data, under the name of Random Survival Forest (RSF). However, RF/RSF has its distinct strategies in classification and prediction. First, it is an ensemble classifier and its performance is an average of multiple rounds of data fitting. Second, the training set is a bootstrap (sampling with replacement) generated set with repeated used of roughly 2/3 of all samples and testing set consists of those not used (out of bag samples). Both features are not intrinsic to Cox regression or other single classifiers. Not considering these two features could potentially lead to a partial comparison between the performance of the two methods. By using a colorectal survival dataset, we illustrate the problems of using k-fold cross-validation, using only one resampling without an ensemble average, and using the whole dataset for both fitting and testing, in Cox regression, when comparing with RSF. We provide a more accessible R code for simple calculation of discordance index (D-index) and unweighted integrated Brier score (IBS) for Cox regression, and unweighted IBS for RSF. Doi: 10.28991/SciMedJ-2021-0301-9 Full Text: PDF

Published in SciMedicine Journal

ISSN: 2704-9833 (Online)
Publisher: Ital Publication
Country of publisher: Italy
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens; Medicine: Public aspects of medicine
Website: https://scimedjournal.org

About the journal

Abstract

Keywords