مجله اپیدمیولوژی ایران (Mar 2019)
Comparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors
Abstract
Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors on the mortality of patients with colorectal cancer using random forest and logistic regression methods. Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shahid Beheshti University of Medical Sciences during the years 2009 to 2014 were used as a retrospective study. Data analysis was performed using random forest and logistic regression methods. To analyze the data, R software version 3.4.3 was considered. Results: Ten important variables related to colorectal cancer deaths were selected by random forest method. Several criteria such as the area under the characteristic curve (AUC) were used to compare the random forest method with logistic regression. According to both criteria, five important variables ranked by random forest were Cancer stage, age of diagnosis, patientchr('39')s age, HLA, and degree of differentiation (tumor differentiation). In terms of different criteria, the random forest method had better performance than logistic regression (Area under the ROC curve for random forest and logistic regression methods was: 98%; 80% respectively). Conclusion: Variables such as Cancer stage, age of diagnosis, patientchr('39')s age, HLA, and degree of differentiation are considered as the most important factors affecting mortality in colorectal cancer, that the patientschr('39') longevity can be increased with the early diagnosis of cancer and screening programs.