Assessing the Generalizability of a Clinical Machine Learning Model Across Multiple Emergency Departments

Alexander J. Ryu, MD; Santiago Romero-Brufau, MD, PhD; Ray Qian, MD; Heather A. Heaton, MD; David M. Nestler, MD; Shant Ayanian, MD; Thomas C. Kingsley, MD

Mayo Clinic Proceedings: Innovations, Quality & Outcomes (Jun 2022)

Assessing the Generalizability of a Clinical Machine Learning Model Across Multiple Emergency Departments

Alexander J. Ryu, MD,
Santiago Romero-Brufau, MD, PhD,
Ray Qian, MD,
Heather A. Heaton, MD,
David M. Nestler, MD,
Shant Ayanian, MD,
Thomas C. Kingsley, MD

Affiliations

Alexander J. Ryu, MD: Division of Hospital Internal Medicine, Mayo Clinic, Rochester, MN; Correspondence: Address to Alexander J. Ryu, MD, 1216 2nd St SW, Old Marion Hall, 4th Floor, Rochester, MN 55902.
Santiago Romero-Brufau, MD, PhD: Department of Medicine, Mayo Clinic, Rochester, MN
Ray Qian, MD: Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
Heather A. Heaton, MD: Department of Emergency Medicine, Mayo Clinic, Rochester, MN
David M. Nestler, MD: Department of Emergency Medicine, Mayo Clinic, Rochester, MN
Shant Ayanian, MD: Division of Hospital Internal Medicine, Mayo Clinic, Rochester, MN
Thomas C. Kingsley, MD: Division of Hospital Internal Medicine, Mayo Clinic, Rochester, MN

Journal volume & issue: Vol. 6, no. 3
pp. 193 – 199

Abstract

Read online

Objective: To assess the generalizability of a clinical machine learning algorithm across multiple emergency departments (EDs). Patients and Methods: We obtained data on all ED visits at our health care system’s largest ED from May 5, 2018, to December 31, 2019. We also obtained data from 3 satellite EDs and 1 distant-hub ED from May 1, 2018, to December 31, 2018. A gradient-boosted machine model was trained on pooled data from the included EDs. To prevent the effect of differing training set sizes, the data were randomly downsampled to match those of our smallest ED. A second model was trained on this downsampled, pooled data. The model’s performance was compared using area under the receiver operating characteristic (AUC). Finally, site-specific models were trained and tested across all the sites, and the importance of features was examined to understand the reasons for differing generalizability. Results: The training data sets contained 1918-64,161 ED visits. The AUC for the pooled model ranged from 0.84 to 0.94 across the sites; the performance decreased slightly when Ns were downsampled to match those of our smallest ED site. When site-specific models were trained and tested across all the sites, the AUCs ranged more widely from 0.71 to 0.93. Within a single ED site, the performance of the 5 site-specific models was most variable for our largest and smallest EDs. Finally, when the importance of features was examined, several features were common to all site-specific models; however, the weight of these features differed. Conclusion: A machine learning model for predicting hospital admission from the ED will generalize fairly well within the health care system but will still have significant differences in AUC performance across sites because of site-specific factors.

Published in Mayo Clinic Proceedings: Innovations, Quality & Outcomes

ISSN: 2542-4548 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General)
Website: https://www.journals.elsevier.com/mayo-clinic-proceedings-innovations-quality-and-outcomes/

About the journal