Comparison of machine learning methods with logistic regression analysis in creating predictive models for risk of critical in-hospital events in COVID-19 patients on hospital admission

Aaron W. Sievering; Peter Wohlmuth; Nele Geßler; Melanie A. Gunawardene; Klaus Herrlinger; Berthold Bein; Dirk Arnold; Martin Bergmann; Lorenz Nowak; Christian Gloeckner; Ina Koch; Martin Bachmann; Christoph U. Herborn; Axel Stang

doi:10.1186/s12911-022-02057-4

BMC Medical Informatics and Decision Making (Nov 2022)

Comparison of machine learning methods with logistic regression analysis in creating predictive models for risk of critical in-hospital events in COVID-19 patients on hospital admission

Aaron W. Sievering,
Peter Wohlmuth,
Nele Geßler,
Melanie A. Gunawardene,
Klaus Herrlinger,
Berthold Bein,
Dirk Arnold,
Martin Bergmann,
Lorenz Nowak,
Christian Gloeckner,
Ina Koch,
Martin Bachmann,
Christoph U. Herborn,
Axel Stang

Affiliations

Aaron W. Sievering: Semmelweis University
Peter Wohlmuth: Semmelweis University
Nele Geßler: Semmelweis University
Melanie A. Gunawardene: Department of Cardiology and Intensive Care Medicine, Asklepios Hospital St. Georg
Klaus Herrlinger: Department of Internal Medicine, Asklepios Hospital Nord-Heidberg
Berthold Bein: Department of Anesthesiology and Intensive Care Medicine, Asklepios Hospital St. Georg
Dirk Arnold: Asklepios Tumorzentrum
Martin Bergmann: Department of Internal Medicine, Cardiology, and Pneumology, Asklepios Hospital Wandsbek
Lorenz Nowak: Department of Intensive Care and Ventilation Medicine, Asklepios Hospital München-Gauting
Christian Gloeckner: Department of Internal Medicine, Asklepios Hospital Oberviechtach
Ina Koch: Biobank for Pulmonary Diseases, Asklepios Hospital München-Gauting
Martin Bachmann: Department of Intensive Care and Ventilatory Medicine, Asklepios Hospital Harburg
Christoph U. Herborn: Semmelweis University
Axel Stang: Semmelweis University

DOI: https://doi.org/10.1186/s12911-022-02057-4
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Machine learning (ML) algorithms have been trained to early predict critical in-hospital events from COVID-19 using patient data at admission, but little is known on how their performance compares with each other and/or with statistical logistic regression (LR). This prospective multicentre cohort study compares the performance of a LR and five ML models on the contribution of influencing predictors and predictor-to-event relationships on prediction model´s performance. Methods We used 25 baseline variables of 490 COVID-19 patients admitted to 8 hospitals in Germany (March–November 2020) to develop and validate (75/25 random-split) 3 linear (L1 and L2 penalty, elastic net [EN]) and 2 non-linear (support vector machine [SVM] with radial kernel, random forest [RF]) ML approaches for predicting critical events defined by intensive care unit transfer, invasive ventilation and/or death (composite end-point: 181 patients). Models were compared for performance (area-under-the-receiver-operating characteristic-curve [AUC], Brier score) and predictor importance (performance-loss metrics, partial-dependence profiles). Results Models performed close with a small benefit for LR (utilizing restricted cubic splines for non-linearity) and RF (AUC means: 0.763–0.731 [RF–L1]); Brier scores: 0.184–0.197 [LR–L1]). Top ranked predictor variables (consistently highest importance: C-reactive protein) were largely identical across models, except creatinine, which exhibited marginal (L1, L2, EN, SVM) or high/non-linear effects (LR, RF) on events. Conclusions Although the LR and ML models analysed showed no strong differences in performance and the most influencing predictors for COVID-19-related event prediction, our results indicate a predictive benefit from taking account for non-linear predictor-to-event relationships and effects. Future efforts should focus on leveraging data-driven ML technologies from static towards dynamic modelling solutions that continuously learn and adapt to changes in data environments during the evolving pandemic. Trial registration number: NCT04659187.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords