Diagnostic and Prognostic Research (Jan 2022)
Graphical calibration curves and the integrated calibration index (ICI) for competing risk models
Abstract
Abstract Background Assessing calibration—the agreement between estimated risk and observed proportions—is an important component of deriving and validating clinical prediction models. Methods for assessing the calibration of prognostic models for use with competing risk data have received little attention. Methods We propose a method for graphically assessing the calibration of competing risk regression models. Our proposed method can be used to assess the calibration of any model for estimating incidence in the presence of competing risk (e.g., a Fine-Gray subdistribution hazard model; a combination of cause-specific hazard functions; or a random survival forest). Our method is based on using the Fine-Gray subdistribution hazard model to regress the cumulative incidence function of the cause-specific outcome of interest on the predicted outcome risk of the model whose calibration we want to assess. We provide modifications of the integrated calibration index (ICI), of E50 and of E90, which are numerical calibration metrics, for use with competing risk data. We conducted a series of Monte Carlo simulations to evaluate the performance of these calibration measures when the underlying model has been correctly specified and when the model was mis-specified and when the incidence of the cause-specific outcome differed between the derivation and validation samples. We illustrated the usefulness of calibration curves and the numerical calibration metrics by comparing the calibration of a Fine-Gray subdistribution hazards regression model with that of random survival forests for predicting cardiovascular mortality in patients hospitalized with heart failure. Results The simulations indicated that the method for constructing graphical calibration curves and the associated calibration metrics performed as desired. We also demonstrated that the numerical calibration metrics can be used as optimization criteria when tuning machine learning methods for competing risk outcomes. Conclusions The calibration curves and numeric calibration metrics permit a comprehensive comparison of the calibration of different competing risk models.
Keywords