Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study

Tim Dong; Shubhra Sinha; Ben Zhai; Daniel P Fudulu; Jeremy Chan; Pradeep Narayan; Andy Judge; Massimo Caputo; Arnaldo Dimagli; Umberto Benedetto; Gianni D Angelini

doi:10.1177/20552076231187605

Digital Health (Jul 2023)

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study

Tim Dong,
Shubhra Sinha,
Ben Zhai,
Daniel P Fudulu,
Jeremy Chan,
Pradeep Narayan,
Andy Judge,
Massimo Caputo,
Arnaldo Dimagli,
Umberto Benedetto,
Gianni D Angelini

Affiliations

Tim Dong: Translational Health Sciences, , University of Bristol, Bristol, UK
Shubhra Sinha: Translational Health Sciences, , University of Bristol, Bristol, UK
Ben Zhai: School of Computing Science, , Newcastle upon Tyne, UK
Daniel P Fudulu: Translational Health Sciences, , University of Bristol, Bristol, UK
Jeremy Chan: Translational Health Sciences, , University of Bristol, Bristol, UK
Pradeep Narayan: Department of Cardiac Surgery, , Kolkata, India
Andy Judge: Translational Health Sciences, , University of Bristol, Bristol, UK
Massimo Caputo: Translational Health Sciences, , University of Bristol, Bristol, UK
Arnaldo Dimagli: Translational Health Sciences, , University of Bristol, Bristol, UK
Umberto Benedetto: Translational Health Sciences, , University of Bristol, Bristol, UK
Gianni D Angelini: Translational Health Sciences, , University of Bristol, Bristol, UK

DOI: https://doi.org/10.1177/20552076231187605
Journal volume & issue: Vol. 9

Abstract

Read online

Objective The introduction of new clinical risk scores (e.g. European System for Cardiac Operative Risk Evaluation (EuroSCORE) II) superseding original scores (e.g. EuroSCORE I) with different variable sets typically result in disparate datasets due to high levels of missingness for new score variables prior to time of adoption. Little is known about the use of ensemble learning to incorporate disparate data from legacy scores. We tested the hypothesised that Homogenenous and Heterogeneous Machine Learning (ML) ensembles will have better performance than ensembles of Dynamic Model Averaging (DMA) for combining knowledge from EuroSCORE I legacy data with EuroSCORE II data to predict cardiac surgery risk. Methods Using the National Adult Cardiac Surgery Audit dataset, we trained 12 different base learner models, based on two different variable sets from either EuroSCORE I (LogES) or EuroScore II (ES II), partitioned by the time of score adoption (1996–2016 or 2012–2016) and evaluated on holdout set (2017–2019). These base learner models were ensembled using nine different combinations of six ML algorithms to produce homogeneous or heterogeneous ensembles. Performance was assessed using a consensus metric. Results Xgboost homogenous ensemble (HE) was the highest performing model (clinical effectiveness metric (CEM) 0.725) with area under the curve (AUC) (0.8327; 95% confidence interval (CI) 0.8323–0.8329) followed by Random Forest HE (CEM 0.723; AUC 0.8325; 95%CI 0.8320–0.8326). Across different heterogenous ensembles, significantly better performance was obtained by combining siloed datasets across time (CEM 0.720) than building ensembles of either 1996–2011 ( t -test adjusted, p = 1.67×10 −6 ) or 2012–2019 ( t -test adjusted, p = 1.35×10 −193 ) datasets alone. Conclusions Both homogenous and heterogenous ML ensembles performed significantly better than DMA ensemble of Bayesian Update models. Time-dependent ensemble combination of variables, having differing qualities according to time of score adoption, enabled previously siloed data to be combined, leading to increased power, clinical interpretability of variables and usage of data.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal