Learning Health Systems (Jan 2024)

Automated generation of comparator patients in the electronic medical record

  • Joseph Rigdon,
  • Brian Ostasiewski,
  • Kamah Woelfel,
  • Kimberly D. Wiseman,
  • Tim Hetherington,
  • Stephen Downs,
  • Marc Kowalkowski

DOI
https://doi.org/10.1002/lrh2.10362
Journal volume & issue
Vol. 8, no. 1
pp. n/a – n/a

Abstract

Read online

Abstract Background Well‐designed randomized trials provide high‐quality clinical evidence but are not always feasible or ethical. In their absence, the electronic medical record (EMR) presents a platform to conduct comparative effectiveness research, central to the emerging academic learning health system (aLHS) model. A barrier to realizing this vision is the lack of a process to efficiently generate a reference comparison group for each patient. Objective To test a multi‐step process for the selection of comparators in the EMR. Materials and Methods We conducted a mixed‐methods study within a large aLHS in North Carolina. We (1) created a list of 35 candidate variables; (2) surveyed 270 researchers to assess the importance of candidate variables; and (3) built consensus rankings around survey‐identified variables (ie, importance scores >7) across two panels of 7–8 clinical research experts. Prioritized algorithm inputs were collected from the EMR and applied using a greedy matching technique. Feasibility was measured as the percentage of patients with 100 matched comparators and performance was measured via computational time and Euclidean distance. Results Nine variables were selected: age, sex, race, ethnicity, body mass index, insurance status, smoking status, Charlson Comorbidity Index, and neighborhood percentage in poverty. The final process successfully generated 100 matched comparators for each of 1.8 million candidate patients, executed in less than 100 min for the majority of strata, and had average Euclidean distance 0.043. Conclusion EMR‐derived matching is feasible to implement across a diverse patient population and can provide a reproducible, efficient source of comparator data for observational studies, with additional testing in clinical research applications needed.

Keywords