String Matching Techniques: An Empirical Assessment Based on Statistics Austria's Business Register

Michaela Denk; Peter Hackl; Norbert Rainer

doi:10.17713/ajs.v34i3.415

Austrian Journal of Statistics (Apr 2016)

String Matching Techniques: An Empirical Assessment Based on Statistics Austria's Business Register

Michaela Denk,
Peter Hackl,
Norbert Rainer

Affiliations

Michaela Denk: ec3 Electronic Commerce Competence Center, Vienna
Peter Hackl: University of Economics and Business Administration, Vienna
Norbert Rainer: Statistics Austria, Vienna

DOI: https://doi.org/10.17713/ajs.v34i3.415
Journal volume & issue: Vol. 34, no. 3

Abstract

Read online

The maintenance and updating of Statistics Austria's business register requires a regularly matching of the register against other data sources; one of them is the register of tax units of the Austrian Federal Ministry of Finance. The matching process is based on string comparison via bigrams of enterprise names and addresses, and a quality class approach assigning pairs of register units into classes of different compliance (i.e., matching quality) based on bigram similarity values and the comparison of other matching variables, like the NACE code or the year of foundation. Based on methodological research concerning matching techniques carried out in the DIECOFIS project, an empirical comparison of the bigram method and other string matching techniques was conducted: the edit distance, the Jaro algorithm and the Jaro-Winkler algorithm, the longest common subsequence and the maximal match were selected as appropriate alternatives and evaluated in the study. This paper briey introduces Statistics Austria's business register and the corresponding maintenance process and reports on the results of the empirical study.

Published in Austrian Journal of Statistics

ISSN: 1026-597X (Print)
Publisher: Austrian Statistical Society
Country of publisher: Austria
LCC subjects: Science: Mathematics: Probabilities. Mathematical statistics; Social Sciences: Statistics
Website: http://www.ajs.or.at

About the journal