International Journal of Population Data Science (Dec 2020)

How Are Linkage Results Using Privacy-Preserving Record Linkage Different?

  • Michael Jarrett,
  • Brent Hills,
  • Yinshan Zhao,
  • Adrian Brown,
  • Sean Randall,
  • James Boyd,
  • Anna Ferrante,
  • Kimberlyn McGrail

Journal volume & issue
Vol. 5, no. 5

Abstract

Read online

Introduction Privacy-Preserving Record Linkage (PPRL) presents opportunities to improve privacy protection when performing record linkage on the most sensitive data. Currently our linkage agency performs all linkages in clear text, but expansion of data sources is now including extremely sensitive data, such as justice data. Understanding that specific circumstances may demand different approaches to linkage, we evaluated a PPRL algorithm implemented through the LinXmart software. This is the first real-world evaluation of PPRL in British Columbia and among the first in Canada. Objectives and Approach Our standard linkage method is probabilistic and relies on rules established by analysts to determine accepted links. Datasets are linked to a population spine (N=8,440,442) containing all current and past residents of the province. LinXmart was configured to link to the top weighted candidate above a predetermined confidence threshold. We evaluated performance by comparing the standard method to PPRL for three increasingly complex (messy) datasets. Initial results on the simplest/cleanest dataset informed an iterative process to improve implementation of PPRL. Results Overall linkage rates were lower for standard linkage (81%) compared to PPRL (90%). Records with a unique ID linked at similarly high rates in clear-text and PPRL, while the performance of PPRL with records without the unique ID varied depending on the exact parameters chosen for the match threshold and field comparisons. Conclusion / Implications This work suggests that for datasets that include a well-populated unique identifier, PPRL can be implemented in real-world linkages without a substantial drop-off in linkage quality. Messier data require careful tuning of linkage parameters to match the performance of clear linkage. PPRL may best be used in cases where clear text identifiers cannot be shared, and where some degradation in linkage rates is acceptable.