Health Data Science (Jan 2021)
Predicting Risk of Mortality in Pediatric ICU Based on Ensemble Step-Wise Feature Selection
Abstract
Background. Prediction of mortality risk in intensive care units (ICU) is an important task. Data-driven methods such as scoring systems, machine learning methods, and deep learning methods have been investigated for a long time. However, few data-driven methods are specially developed for pediatric ICU. In this paper, we aim to amend this gap—build a simple yet effective linear machine learning model from a number of hand-crafted features for mortality prediction in pediatric ICU. Methods. We use a recently released publicly available pediatric ICU dataset named pediatric intensive care (PIC) from Children’s Hospital of Zhejiang University School of Medicine in China. Unlike previous sophisticated machine learning methods, we want our method to keep simple that can be easily understood by clinical staffs. Thus, an ensemble step-wise feature ranking and selection method is proposed to select a small subset of effective features from the entire feature set. A logistic regression classifier is built upon selected features for mortality prediction. Results. The final predictive linear model with 11 features achieves a 0.7531 ROC-AUC score on the hold-out test set, which is comparable with a logistic regression classifier using all 397 features (0.7610 ROC-AUC score) and is higher than the existing well known pediatric mortality risk scorer PRISM III (0.6895 ROC-AUC score). Conclusions. Our method improves feature ranking and selection by utilizing an ensemble method while keeping a simple linear form of the predictive model and therefore achieves better generalizability and performance on mortality prediction in pediatric ICU.