BMC Medical Informatics and Decision Making (May 2024)

Improved nonparametric survival prediction using CoxPH, Random Survival Forest & DeepHit Neural Network

  • Naseem Asghar,
  • Umair Khalil,
  • Basheer Ahmad,
  • Huda M. Alshanbari,
  • Muhammad Hamraz,
  • Bakhtiyar Ahmad,
  • Dost Muhammad Khan

DOI
https://doi.org/10.1186/s12911-024-02525-z
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 17

Abstract

Read online

Abstract In recent times, time-to-event data such as time to failure or death is routinely collected alongside high-throughput covariates. These high-dimensional bioinformatics data often challenge classical survival models, which are either infeasible to fit or produce low prediction accuracy due to overfitting. To address this issue, the focus has shifted towards introducing a novel approaches for feature selection and survival prediction. In this article, we propose a new hybrid feature selection approach that handles high-dimensional bioinformatics datasets for improved survival prediction. This study explores the efficacy of four distinct variable selection techniques: LASSO, RSF-vs, SCAD, and CoxBoost, in the context of non-parametric biomedical survival prediction. Leveraging these methods, we conducted comprehensive variable selection processes. Subsequently, survival analysis models—specifically CoxPH, RSF, and DeepHit NN—were employed to construct predictive models based on the selected variables. Furthermore, we introduce a novel approach wherein only variables consistently selected by a majority of the aforementioned feature selection techniques are considered. This innovative strategy, referred to as the proposed method, aims to enhance the reliability and robustness of variable selection, subsequently improving the predictive performance of the survival analysis models. To evaluate the effectiveness of the proposed method, we compare the performance of the proposed approach with the existing LASSO, RSF-vs, SCAD, and CoxBoost techniques using various performance metrics including integrated brier score (IBS), concordance index (C-Index) and integrated absolute error (IAE) for numerous high-dimensional survival datasets. The real data applications reveal that the proposed method outperforms the competing methods in terms of survival prediction accuracy.

Keywords