Statistics and Public Policy (Jan 2017)
ADGN: An Algorithm for Record Linkage Using Address, Date of Birth, Gender, and Name
Abstract
This article presents an algorithm for record linkage that uses multiple indicators derived from combinations of fields commonly found in databases. Specifically, the quadruplet of Address (A), Date of Birth (D), Gender (G), and Name (N) and any triplet of A-D-G-N (i.e., ADG, ADN, AGN, and DGN) also link records with an extremely high likelihood. Matching on multiple identifiers avoids problems of missing data, inconsistent fields, and typographical errors. We show, using a very large database from the State of Texas, that exact matches using combinations A, D, G, and N produce a rate of matches comparable to 9-Digit Social Security Number. Further examination of the linkage rates show that reporting of the data at a higher level of aggregation, such as Birth Year instead of Date of Birth and omission of names, makes correct matches between databases highly unlikely, protecting an individual’s records.
Keywords