E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database.

Nima Safaei; Babak Safaei; Seyedhouman Seyedekrami; Mojtaba Talafidaryani; Arezoo Masoud; Shaodong Wang; Qing Li; Mahdi Moqri

doi:10.1371/journal.pone.0262895

PLoS ONE (Jan 2022)

E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database.

Nima Safaei,
Babak Safaei,
Seyedhouman Seyedekrami,
Mojtaba Talafidaryani,
Arezoo Masoud,
Shaodong Wang,
Qing Li,
Mahdi Moqri

Affiliations

Nima Safaei
Babak Safaei
Seyedhouman Seyedekrami
Mojtaba Talafidaryani
Arezoo Masoud
Shaodong Wang
Qing Li
Mahdi Moqri

DOI: https://doi.org/10.1371/journal.pone.0262895
Journal volume & issue: Vol. 17, no. 5
p. e0262895

Abstract

Read online

Improving the Intensive Care Unit (ICU) management network and building cost-effective and well-managed healthcare systems are high priorities for healthcare units. Creating accurate and explainable mortality prediction models helps identify the most critical risk factors in the patients' survival/death status and early detect the most in-need patients. This study proposes a highly accurate and efficient machine learning model for predicting ICU mortality status upon discharge using the information available during the first 24 hours of admission. The most important features in mortality prediction are identified, and the effects of changing each feature on the prediction are studied. We used supervised machine learning models and illness severity scoring systems to benchmark the mortality prediction. We also implemented a combination of SHAP, LIME, partial dependence, and individual conditional expectation plots to explain the predictions made by the best-performing model (CatBoost). We proposed E-CatBoost, an optimized and efficient patient mortality prediction model, which can accurately predict the patients' discharge status using only ten input features. We used eICU-CRD v2.0 to train and validate the models; the dataset contains information on over 200,000 ICU admissions. The patients were divided into twelve disease groups, and models were fitted and tuned for each group. The models' predictive performance was evaluated using the area under a receiver operating curve (AUROC). The AUROC scores were 0.86 [std:0.02] to 0.92 [std:0.02] for CatBoost and 0.83 [std:0.02] to 0.91 [std:0.03] for E-CatBoost models across the defined disease groups; if measured over the entire patient population, their AUROC scores were 7 to 18 and 2 to 12 percent higher than the baseline models, respectively. Based on SHAP explanations, we found age, heart rate, respiratory rate, blood urine nitrogen, and creatinine level as the most critical cross-disease features in mortality predictions.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal