Scientific Reports (Feb 2023)
Machine learning to improve frequent emergency department use prediction: a retrospective cohort study
Abstract
Abstract Frequent emergency department use is associated with many adverse events, such as increased risk for hospitalization and mortality. Frequent users have complex needs and associated factors are commonly evaluated using logistic regression. However, other machine learning models, especially those exploiting the potential of large databases, have been less explored. This study aims at comparing the performance of logistic regression to four machine learning models for predicting frequent emergency department use in an adult population with chronic diseases, in the province of Quebec (Canada). This is a retrospective population-based study using medical and administrative databases from the Régie de l’assurance maladie du Québec. Two definitions were used for frequent emergency department use (outcome to predict): having at least three and five visits during a year period. Independent variables included sociodemographic characteristics, healthcare service use, and chronic diseases. We compared the performance of logistic regression with gradient boosting machine, naïve Bayes, neural networks, and random forests (binary and continuous outcome) using Area under the ROC curve, sensibility, specificity, positive predictive value, and negative predictive value. Out of 451,775 ED users, 43,151 (9.5%) and 13,676 (3.0%) were frequent users with at least three and five visits per year, respectively. Random forests with a binary outcome had the lowest performances (ROC curve: 53.8 [95% confidence interval 53.5–54.0] and 51.4 [95% confidence interval 51.1–51.8] for frequent users 3 and 5, respectively) while the other models had superior and overall similar performance. The most important variable in prediction was the number of emergency department visits in the previous year. No model outperformed the others. Innovations in algorithms may slightly refine current predictions, but access to other variables may be more helpful in the case of frequent emergency department use prediction.