Western Journal of Emergency Medicine (Dec 2023)
Development and External Validation of Clinical Features-based Machine Learning Models for Predicting COVID-19 in the Emergency Department
Abstract
Introduction: Timely diagnosis of patients affected by an emerging infectious disease plays a crucial role in treating patients and avoiding disease spread. In prior research, we developed an approach by using machine learning (ML) algorithms to predict serious acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection based on clinical features of patients visiting an emergency department (ED) during the early coronavirus 2019 (COVID-19) pandemic. In this study, we aimed to externally validate this approach within a distinct ED population. Methods: To create our training/validation cohort (model development) we collected data retrospectively from suspected COVID-19 patients at a US ED from February 23–May 12, 2020. Another dataset was collected as an external validation (testing) cohort from an ED in another country from May 12–June 15, 2021. Clinical features including patient demographics and triage information were used to train and test the models. The primary outcome was the confirmed diagnosis of COVID-19, defined as a positive reverse transcription polymerase chain reaction test result for SARS-CoV-2. We employed three different ML algorithms, including gradient boosting, random forest, and extra trees classifiers, to construct the predictive model. The predictive performances were evaluated with the area under the receiver operating characteristic curve (AUC) in the testing cohort. Results: In total, 580 and 946 ED patients were included in the training and testing cohorts, respectively. Of them, 98 (16.9%) and 180 (19.0%) were diagnosed with COVID-19. All the constructed ML models showed acceptable discrimination, as indicated by the AUC. Among them, random forest (0.785, 95% confidence interval [CI] 0.747–0.822) performed better than gradient boosting (0.774, 95% CI 0.739–0.811) and extra trees classifier (0.72, 95% CI 0.677–0.762). There was no significant difference between the constructed models. Conclusion: Our study validates the use of ML for predicting COVID-19 in the ED and demonstrates its potential for predicting emerging infectious diseases based on models built by clinical features with temporal and spatial heterogeneity. This approach holds promise for scenarios where effective diagnostic tools for an emerging infectious disease may be lacking in the future.