Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Artuur M. Leeuwenberg; Maarten van Smeden; Johannes A. Langendijk; Arjen van der Schaaf; Murielle E. Mauer; Karel G. M. Moons; Johannes B. Reitsma; Ewoud Schuit

doi:10.1186/s41512-021-00115-5

Diagnostic and Prognostic Research (Jan 2022)

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Artuur M. Leeuwenberg,
Maarten van Smeden,
Johannes A. Langendijk,
Arjen van der Schaaf,
Murielle E. Mauer,
Karel G. M. Moons,
Johannes B. Reitsma,
Ewoud Schuit

Affiliations

Artuur M. Leeuwenberg: Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University
Maarten van Smeden: Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University
Johannes A. Langendijk: Department of Radiation Oncology, University Medical Center Groningen, Groningen University
Arjen van der Schaaf: Department of Radiation Oncology, University Medical Center Groningen, Groningen University
Murielle E. Mauer: European Organisation for Research and Treatment of Cancer Headquarters
Karel G. M. Moons: Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University
Johannes B. Reitsma: Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University
Ewoud Schuit: Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University

DOI: https://doi.org/10.1186/s41512-021-00115-5
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate. Methods We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations. Results In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R 2, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout. Conclusions Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.

Published in Diagnostic and Prognostic Research

ISSN: 2397-7523 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General)
Website: https://diagnprognres.biomedcentral.com/

About the journal

Abstract

Keywords