Cancer Informatics (Jun 2023)

Comparing Individualized Survival Predictions From Random Survival Forests and Multistate Models in the Presence of Missing Data: A Case Study of Patients With Oropharyngeal Cancer

  • Madeline R Abbott,
  • Lauren J Beesley,
  • Emily L Bellile,
  • Andrew G Shuman,
  • Laura S Rozek,
  • Jeremy M G Taylor

DOI
https://doi.org/10.1177/11769351231183847
Journal volume & issue
Vol. 22

Abstract

Read online

Background: In recent years, interest in prognostic calculators for predicting patient health outcomes has grown with the popularity of personalized medicine. These calculators, which can inform treatment decisions, employ many different methods, each of which has advantages and disadvantages. Methods: We present a comparison of a multistate model (MSM) and a random survival forest (RSF) through a case study of prognostic predictions for patients with oropharyngeal squamous cell carcinoma. The MSM is highly structured and takes into account some aspects of the clinical context and knowledge about oropharyngeal cancer, while the RSF can be thought of as a black-box non-parametric approach. Key in this comparison are the high rate of missing values within these data and the different approaches used by the MSM and RSF to handle missingness. Results: We compare the accuracy (discrimination and calibration) of survival probabilities predicted by both approaches and use simulation studies to better understand how predictive accuracy is influenced by the approach to (1) handling missing data and (2) modeling structural/disease progression information present in the data. We conclude that both approaches have similar predictive accuracy, with a slight advantage going to the MSM. Conclusions: Although the MSM shows slightly better predictive ability than the RSF, consideration of other differences are key when selecting the best approach for addressing a specific research question. These key differences include the methods’ ability to incorporate domain knowledge, and their ability to handle missing data as well as their interpretability, and ease of implementation. Ultimately, selecting the statistical method that has the most potential to aid in clinical decisions requires thoughtful consideration of the specific goals.