Frontiers in Applied Mathematics and Statistics (Oct 2023)
Simulating time-to-event data under the Cox proportional hazards model: assessing the performance of the non-parametric Flexible Hazards Method
Abstract
Numerous methods and approaches have been developed for generating time-to-event data from the Cox Proportional Hazards (CPH) model; however, they often require specification of a parametric distribution for the baseline hazard even though the CPH model itself makes no assumptions on the distribution of the baseline hazards. In line with the semi-parametric nature of the CPH model, a recently proposed method called the Flexible Hazards Method generates time-to-event data from a CPH model using a non-parametric baseline hazard function. While the initial results of this method are promising, it has not yet been comprehensively assessed with increasing covariates or against data generated under parametric baseline hazards. To fill this gap, we conducted a comprehensive study to benchmark the performance of the Flexible Hazards Method for generating data from a CPH model against parametric methods. Our results showed that with a single covariate and large enough assumed maximum time, the bias in the Flexible Hazards Method is 0.02 (with respect to the log hazard ratio) with a 95% confidence interval having coverage of 84.4%. This bias increases to 0.054 when there are 10 covariates under the same settings and the coverage of the 95% confidence interval decreases to 46.7%. In this paper, we explain the plausible reasons for this observed increase in bias and decrease in coverage as the number of covariates are increased, both empirically and theoretically, and provide readers and potential users of this method with some suggestions on how to best address these issues. In summary, the Flexible Hazards Method performs well when there are few covariates and the user wishes to simulate data from a non-parametric baseline hazard.
Keywords