Applied Sciences (Sep 2021)

Record Linkage of Chinese Patent Inventors and Authors of Scientific Articles

  • Robert Nowak,
  • Wiktor Franus,
  • Jiarui Zhang,
  • Yue Zhu,
  • Xin Tian,
  • Zhouxian Zhang,
  • Xu Chen,
  • Xiaoyu Liu

DOI
https://doi.org/10.3390/app11188417
Journal volume & issue
Vol. 11, no. 18
p. 8417

Abstract

Read online

We present an algorithm to find corresponding authors of patents and scientific articles. The authors are given as records in Scopus and the Chinese Patents Database. This issue is known as the record linkage problem, defined as finding and linking individual records from separate databases that refer to the same real-world entity. The presented solution is based on a record linkage framework combined with text feature extraction and machine learning techniques. The main challenges were low data quality, lack of common record identifiers, and a limited number of other attributes shared by both data sources. Matching based solely on an exact comparison of authors’ names does not solve the records linking problem because many Chinese authors share the same full name. Moreover, the English spelling of Chinese names is not standardized in the analyzed data. Three ideas on how to extend attribute sets and improve record linkage quality were proposed: (1) fuzzy matching of names, (2) comparison of abstracts of patents and articles, (3) comparison of scientists’ main research areas calculated using all metadata available. The presented solution was evaluated in terms of matching quality and complexity on ≈250,000 record pairs linked by human experts. The results of numerical experiments show that the proposed strategies increase the quality of record linkage compared to typical solutions.

Keywords