International Journal of Population Data Science (Nov 2019)
Evaluation measure for group-based record linkage
Abstract
Traditionally, record linkage is concerned with linking pairs of records across data sets and the classification of such pairs into matches (assumed to refer to the same individual) and non-matches (assumed to refer to different individuals). Increasingly, however, more complex data sets are being linked where often the aim is to identify groups, or clusters, of records that refer to the same individual or to a group of related individuals. Examples include finding the records of all births to the same parents or all medical records generated by members of the same family. When ground truth data in the form of known true matches and non-matches are available, then linkage quality is traditionally evaluated based on the classified versus the true matches (links) using measures such as precision (also known as the positive predictive value) and recall (also known as sensitivity or the true positive rate). The quality of clusters generated in record linkage is of high importance, since the comparison of different linkage methods is largely based on the values obtained by such evaluation measures. However, minimal research has been conducted thus far to evaluate the suitability of existing evaluation measures in the context of linking groups of records. As we show, evaluation measures such as precision and recall are not suitable for evaluating groups of linked records because they evaluate the quality of individually linked record pairs rather than the quality of records grouped into clusters. We highlight the shortcomings of traditional evaluation measures and then propose a novel approach to evaluate cluster quality in the context of group-based record linkage. We empirically evaluate our proposed approach using real-world data and show that it better reflects the quality of clusters generated by a group-based record linkage technique.
Keywords