Informatics in Medicine Unlocked (Jan 2022)

Applied forecasting for delayed cerebral ischemia prediction post subarachnoid hemorrhage: Methodological fallacies

  • Georgios Alexopoulos,
  • Justin Zhang,
  • Ioannis Karampelas,
  • Maheen Khan,
  • Nabiha Quadri,
  • Mayur Patel,
  • Niel Patel,
  • Mohammad Almajali,
  • Tobias A. Mattei,
  • Joanna Kemp,
  • Jeroen Coppens,
  • Philippe Mercier

Journal volume & issue
Vol. 28
p. 100817

Abstract

Read online

Introduction: Delayed Cerebral Ischemia (DCI) is an important cause of morbidity and mortality after aneurysmal Subarachnoid Hemorrhage (aSAH). Researchers have utilized various methods for predicting patients at risk for DCI progression. Methods: An eight-year retrospective review of aSAH patients who presented to St Louis University Hospital. The records were screened for demographic, clinical, and radiographic parameters. DCI was the primary outcome. We identified 16 features to fit various forecasting models and selected the best binary classifier through comprehensive machine learning (ML) workflows. Regression and ensemble tree-based algorithms were utilized, based on their performance on tabular data. We investigated whether a single model could outperform in our dataset. Due to the expected outcome class imbalance (DCI), we selected precision, recall, and F-score as threshold metrics. Precision-recall curves were used for model performance ranking. Results: Of the 213 aSAH patients analyzed, 42 progressed to DCI (19.7%). The mean age was 55.7 years. The outcome variable (DCI) was imbalanced with a class ratio of 1:4. Bivariate analysis revealed two significant associations: The “Hunt-and-Hess scale” (p-value = 0.016), and “Posthemorrhagic hydrocephalus” (p-value < 0.001). The all-relevant important factors during feature selection were: “Fisher scale,” “Modified Fisher scale,” “Hunt-and-Hess scale,” and “Posthemorrhagic hydrocephalus”. “Treatment type” was tentative. The random forests model achieved a pooled accuracy of 71.1% (95%CI: 60.4, 83.4) with an F1-score of 0.484. The best binary classifier utilized extreme gradient boosting while trained on the all-relevant predictors plus “Aneurysm type.” Extreme gradient boosting achieved a predictive accuracy of 84.3% (95%CI: 75.9, 93.4) with an F1-score of 0.684. We describe the challenges that arise during training of a binary classifier on imbalanced datasets, and, while going through an extensive comparison review of similar published studies, we not only demonstrate the model's performance but also identify multiple forecasting methodological fallacies in neurological research. Conclusion: By implementing baseline patient characteristics combined with radiographic grading scales, we built a simple yet robust, highly accurate—but, most importantly—useful binary classifier for DCI prediction. The model is available online, and it can be utilized clinically as an effective forecasting tool (https://georgiosalexopoulos.shinyapps.io/download/).

Keywords