International Journal of Population Data Science (Sep 2023)

Understanding data provenance when using electronic medical records for research: Lessons learned from the Deliver Primary Healthcare Information (DELPHI) database

  • Jason Edward Black,
  • Amanda Terry,
  • Sonny Cejic,
  • Thomas Freeman,
  • Daniel Lizotte,
  • Scott McKay,
  • Mark Speechley,
  • Bridget Ryan

DOI
https://doi.org/10.23889/ijpds.v8i5.2177
Journal volume & issue
Vol. 8, no. 5

Abstract

Read online

Introduction We set out to assess the impact of Choosing Wisely Canada recommendations (2014) on reducing unnecessary health investigations and interventions in primary care across Southwestern Ontario. Methods We used the Deliver Primary Healthcare Information (DELPHI) database, which stores deidentified electronic medical records (EMR) of nearly 65,000 primary care patients across Southwestern Ontario. When conducting research using EMR data, data provenance (i.e., how the data came to be) should first be established. We first considered DELPHI data provenance in relation to longitudinal analyses, flagging a change in EMR software that occurred during 2012 and 2013. We attempted to link records between EMR databases produced by different software using probabilistic linkage and inspected 10 years of data in the DELPHI database (2009 to 2019) for data quality issues, including comparability over time. Results We encountered several issues resulting from this change in EMR software. These included limited linkage of records between software without a common identifier; data migration issues that distorted procedure dates; and unusual changes in laboratory test and medication prescription volumes. Conclusion This study reinforces the necessity of assessing data provenance and quality for new research projects. By understanding data provenance, we can anticipate related data quality issues such as changes in EMR data over time-which represent a growing concern as longitudinal data analyses increase in feasibility and popularity.

Keywords