IEEE Access (Jan 2018)
Learning Distance Metrics for Entity Resolution
Abstract
Entity resolution (ER) is to find database records that refer to the same real-world entity. A key component for ER is to choose a proper distance (similarity) function for each database field to quantify the similarity of records. Most existing ER approaches focus on how to define a proper matching rule based on generic or hand-crafted distance metrics. In this paper, we explore two learnable string distance metrics for two kinds of ER problems by employing the principle component analysis and the largest margin nearest neighbor algorithm for training. Experimental results on real data sets show that our approaches can improve entity resolution accuracy over traditional techniques.
Keywords