Comparison of nine machine learning regression models in predicting hospital length of stay for patients admitted to a general medicine department

Addisu Jember Zeleke; Pierpaolo Palumbo; Paolo Tubertini; Rossella Miglio; Lorenzo Chiari

Informatics in Medicine Unlocked (Jan 2024)

Comparison of nine machine learning regression models in predicting hospital length of stay for patients admitted to a general medicine department

Addisu Jember Zeleke,
Pierpaolo Palumbo,
Paolo Tubertini,
Rossella Miglio,
Lorenzo Chiari

Affiliations

Addisu Jember Zeleke: Department of Electrical, Electronic, and Information Engineering Guglielmo Marconi, University of Bologna, 40126, Bologna, Italy; Corresponding author.
Pierpaolo Palumbo: Department of Electrical, Electronic, and Information Engineering Guglielmo Marconi, University of Bologna, 40126, Bologna, Italy
Paolo Tubertini: Enterprise Information Systems for Integrated Care and Research Data Management, IRCCS Azienda Ospedaliero- Universitaria di Bologna, 40138, Bologna, Italy
Rossella Miglio: Department of Statistical Sciences, University of Bologna, 40126, Bologna, Italy
Lorenzo Chiari: Department of Electrical, Electronic, and Information Engineering Guglielmo Marconi, University of Bologna, 40126, Bologna, Italy; Health Sciences and Technologies Interdepartmental Center for Industrial Research (CIRI SDV), University of Bologna, 40126, Bologna, Italy

Journal volume & issue: Vol. 47
p. 101499

Abstract

Read online

Background: The General Medicine (GM) department has the highest patient volume and heterogeneity among other hospital specialties. Closely examining hospitalization data is crucial because patients come with various conditions or traits. Length of stay (LoS) in hospitals is often used as an efficiency indicator. It is influenced by various factors, including the patient's medical background, demographics, and type of diseases/signs/symptoms at the triage. LoS is a variable that can vary widely, making it difficult to estimate it promptly and accurately, but doing so is highly beneficial. Moreover, efficiently grouping and managing patients based on their expected LoS remains a significant challenge for healthcare organizations. Objectives: This study aimed to compare the predictive ability of nine Machine Learning (ML) regression models in estimating the actual number of LoS days using demographics and clinical information recorded at admission as independent variables. Methods: We analyzed data collected on patients hospitalized at the GM department of the Sant'Orsola-Malpighi University Hospital in Bologna, Italy, who were admitted through the Emergency Department. The data were collected from January 1, 2022, to October 26, 2022. Nine ML regression models were used to predict LoS by analyzing historical data and patient information. The models' performance was assessed through root mean squared prediction error (RMSPE) and mean absolute prediction error (MAPE). Moreover, we used K-means clustering to group patients' medical and organizational criticalities (such as diseases, signs, symptoms, and administrative problems) into four clusters. Feature Importance plots and SHAP (SHapley Additive exPlanations) values were employed to identify the more essential features and enhance the interpretability of the results. Results: We analyzed the LoS of 3757 eligible patients, which showed an average of 13 days and a standard deviation of 11.8 days. We randomly divided patients into a training cohort of 2630 (70 %) and a test cohort of 1127 (30 %). The predictive performance of the different models was between 11.00 and 16.16 days for RMSPE and between 7.52 and 10.78 days for MAPE. The eXtreme Gradient Boosting Regression (XGBR) model had the lowest prediction error, both in terms of RMSPE (11.00 days) and MAE (7.52 days). Sex, arrival via own vehicle/walk-in, ambulance arrival, light blue risk category, age 70 or older, and orange risk category are some of the top features. Conclusion: The ML models evaluated in this study reported good predictive performance, with the XGBR model exhibiting the lowest prediction error. This model holds the potential to aid physicians in administering appropriate clinical interventions for patients in the GM department. This model can also help healthcare services predict the resources necessary to better manage hospitalization.

Published in Informatics in Medicine Unlocked

ISSN: 2352-9148 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/informatics-in-medicine-unlocked/

About the journal

Abstract

Keywords