BMC Medical Research Methodology (Dec 2021)

The roles of predictors in cardiovascular risk models - a question of modeling culture?

  • Christine Wallisch,
  • Asan Agibetov,
  • Daniela Dunkler,
  • Maria Haller,
  • Matthias Samwald,
  • Georg Dorffner,
  • Georg Heinze

DOI
https://doi.org/10.1186/s12874-021-01487-4
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background While machine learning (ML) algorithms may predict cardiovascular outcomes more accurately than statistical models, their result is usually not representable by a transparent formula. Hence, it is often unclear how specific values of predictors lead to the predictions. We aimed to demonstrate with graphical tools how predictor-risk relations in cardiovascular risk prediction models fitted by ML algorithms and by statistical approaches may differ, and how sample size affects the stability of the estimated relations. Methods We reanalyzed data from a large registry of 1.5 million participants in a national health screening program. Three data analysts developed analytical strategies to predict cardiovascular events within 1 year from health screening. This was done for the full data set and with gradually reduced sample sizes, and each data analyst followed their favorite modeling approach. Predictor-risk relations were visualized by partial dependence and individual conditional expectation plots. Results When comparing the modeling algorithms, we found some similarities between these visualizations but also occasional divergence. The smaller the sample size, the more the predictor-risk relation depended on the modeling algorithm used, and also sampling variability played an increased role. Predictive performance was similar if the models were derived on the full data set, whereas smaller sample sizes favored simpler models. Conclusion Predictor-risk relations from ML models may differ from those obtained by statistical models, even with large sample sizes. Hence, predictors may assume different roles in risk prediction models. As long as sample size is sufficient, predictive accuracy is not largely affected by the choice of algorithm.

Keywords