BMC Medical Informatics and Decision Making (Jan 2024)
Development and evaluation of regression tree models for predicting in-hospital mortality of a national registry of COVID-19 patients over six pandemic surges
Abstract
Abstract Background Objective prognostic information is essential for good clinical decision making. In case of unknown diseases, scarcity of evidence and limited tacit knowledge prevent obtaining this information. Prediction models can be useful, but need to be not only evaluated on how well they predict, but also how stable these models are under fast changing circumstances with respect to development of the disease and the corresponding clinical response. This study aims to provide interpretable and actionable insights, particularly for clinicians. We developed and evaluated two regression tree predictive models for in-hospital mortality of COVID-19 patient at admission and 24 hours (24 h) after admission, using a national registry. We performed a retrospective analysis of observational routinely collected data. Methods Two regression tree models were developed for admission and 24 h after admission. The complexity of the trees was managed via cross validation to prevent overfitting. The predictive ability of the model was assessed via bootstrapping using the Area under the Receiver-Operating-Characteristic curve, Brier score and calibration curves. The tree models were assessed on the stability of their probabilities and predictive ability, on the selected variables, and compared to a full-fledged logistic regression model that uses variable selection and variable transformations using splines. Participants included COVID-19 patients from all ICUs participating in the Dutch National Intensive Care Evaluation (NICE) registry, who were admitted at the ICU between February 27, 2020, and November 23, 2021. From the NICE registry, we included concerned demographic data, minimum and maximum values of physiological data in the first 24 h of ICU admission and diagnoses (reason for admission as well as comorbidities) for model development. The main outcome measure was in-hospital mortality. We additionally analysed the Length-of-Stay (LoS) per patient subgroup per survival status. Results A total of 13,369 confirmed COVID-19 patients from 70 ICUs were included (with mortality rate of 28%). The optimism-corrected AUROC of the admission tree (with seven paths) was 0.72 (95% CI: 0.71–0.74) and of the 24 h tree (with 11 paths) was 0.74 (0.74–0.77). Both regression trees yielded good calibration and variable selection for both trees was stable. Patient subgroups comprising the tree paths had comparable survival probabilities as the full-fledged logistic regression model, survival probabilities were stable over six COVID-19 surges, and subgroups were shown to have added predictive value over the individual patient variables. Conclusions We developed and evaluated regression trees, which operate at par with a carefully crafted logistic regression model. The trees consist of homogenous subgroups of patients that are described by simple interpretable constraints on patient characteristics thereby facilitating shared decision-making.
Keywords