IEEE Access (Jan 2022)

A Comprehensive Machine Learning Based Pipeline for an Accurate Early Prediction of Sepsis in ICU

  • B. C. Srimedha,
  • Rashmi Naveen Raj,
  • Veena Mayya

DOI
https://doi.org/10.1109/ACCESS.2022.3210575
Journal volume & issue
Vol. 10
pp. 105120 – 105132

Abstract

Read online

Sepsis is a lethal infection-related illness that has an extremely high fatality rate, especially among intensive care unit patients. Early and precise recognition of sepsis is critical as delayed treatment increases the mortality rate dramatically. System inflammatory response syndrome, quick sequential organ failure assessment, and modified early warning score are the traditional clinical score systems in practice to detect sepsis. But the scoring systems fail in the early prediction of sepsis, a stage in which if a patient is treated immediately, the mortality rate will reduce significantly. The proposed classifier can accurately predict sepsis up to six hours before the disease is clinically diagnosed. The patient’s electronic medical records, demographics, and vital signs are used to achieve this. The study uses data set adaptive data preprocessing strategies. The proposed method adds value to existing literature by introducing a novel outlier-based mean-median data imputation technique that enhances the prediction’s overall accuracy. The primary factors that influence the classifier’s predictions have been outlined, making the model easier to understand for medical professionals. For the classification of patients as sepsis positive or negative, four algorithms were investigated: Random Forest, Logistic Regression, Gradient Boosting, and Decision Tree. Of all the prediction algorithms, Random Forest gives the best results with an accuracy of 99.01%, F1-score of 99%, and an area under the receiver operator characteristic curve of 99.99%. Even for a 24-hour early prediction of sepsis, the random forest method is proven to provide greater prediction accuracy while logistic regression provides the least prediction accuracy. We attribute this to the fact that, unlike regression models, random forests do not require that the model have a linear relationship between the dependent and independent variables. The evaluation measures produced are useful and can be tremendously valuable in predicting sepsis in a timely and accurate manner.

Keywords