GMS Medizinische Informatik, Biometrie und Epidemiologie (Jun 2016)

Quality of record linkage in a highly automated cancer registry that relies on encrypted identity data

  • Schmidtmann, Irene,
  • Sariyar, Murat,
  • Borg, Andreas,
  • Gerold-Ay, Aslihan,
  • Heidinger, Oliver,
  • Hense, Hans-Werner,
  • Krieg, Volker,
  • Hammer, Gaël Paul

DOI
https://doi.org/10.3205/mibe000164
Journal volume & issue
Vol. 12, no. 1
p. Doc02

Abstract

Read online

Objectives: In the absence of unique ID numbers, cancer and other registries in Germany and elsewhere rely on identity data to link records pertaining to the same patient. These data are often encrypted to ensure privacy. Some record linkage errors unavoidably occur. These errors were quantified for the cancer registry of North Rhine Westphalia which uses encrypted identity data. Methods: A sample of records was drawn from the registry, record linkage information was included. In parallel, plain text data for these records were retrieved to generate a gold standard. Record linkage error frequencies in the cancer registry were determined by comparison of the results of the routine linkage with the gold standard. Error rates were projected to larger registries.Results: In the sample studied, the homonym error rate was 0.015%; the synonym error rate was 0.2%. The F-measure was 0.9921. Projection to larger databases indicated that for a realistic development the homonym error rate will be around 1%, the synonym error rate around 2%.Conclusion: Observed error rates are low. This shows that effective methods to standardize and improve the quality of the input data have been implemented. This is crucial to keep error rates low when the registry’s database grows. The planned inclusion of unique health insurance numbers is likely to further improve record linkage quality. Cancer registration entirely based on electronic notification of records can process large amounts of data with high quality of record linkage.

Keywords