Frontiers in Cardiovascular Medicine (Mar 2023)

Data processing pipeline for cardiogenic shock prediction using machine learning

  • Nikola Jajcay,
  • Nikola Jajcay,
  • Branislav Bezak,
  • Branislav Bezak,
  • Branislav Bezak,
  • Amitai Segev,
  • Amitai Segev,
  • Shlomi Matetzky,
  • Shlomi Matetzky,
  • Jana Jankova,
  • Michael Spartalis,
  • Michael Spartalis,
  • Mohammad El Tahlawi,
  • Federico Guerra,
  • Julian Friebel,
  • Tharusan Thevathasan,
  • Tharusan Thevathasan,
  • Tharusan Thevathasan,
  • Tharusan Thevathasan,
  • Imrich Berta,
  • Leo Pölzl,
  • Felix Nägele,
  • Edita Pogran,
  • F. Aaysha Cader,
  • Milana Jarakovic,
  • Milana Jarakovic,
  • Can Gollmann-Tepeköylü,
  • Marta Kollarova,
  • Katarina Petrikova,
  • Otilia Tica,
  • Otilia Tica,
  • Konstantin A. Krychtiuk,
  • Konstantin A. Krychtiuk,
  • Guido Tavazzi,
  • Guido Tavazzi,
  • Carsten Skurk,
  • Carsten Skurk,
  • Kurt Huber,
  • Allan Böhm,
  • Allan Böhm,
  • Allan Böhm

DOI
https://doi.org/10.3389/fcvm.2023.1132680
Journal volume & issue
Vol. 10

Abstract

Read online

IntroductionRecent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS.MethodsWe mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)—based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction.ResultsWe achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization.ConclusionWe believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.

Keywords