Entropy (May 2021)

FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion

  • Hongming Zhu,
  • Xiaowen Wang,
  • Yizhi Jiang,
  • Hongfei Fan,
  • Bowen Du,
  • Qin Liu

DOI
https://doi.org/10.3390/e23050602
Journal volume & issue
Vol. 23, no. 5
p. 602

Abstract

Read online

Instance matching is a key task in knowledge graph fusion, and it is critical to improving the efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking algorithms selecting candidate instance pairs for comparison is one of the effective methods to achieve the goal. In this paper, we propose a novel blocking algorithm named MultiObJ, which constructs indexes for instances based on the Ordered Joint of Multiple Objects’ features to limit the number of candidate instance pairs. Based on MultiObJ, we further propose a distributed framework named Follow-the-Regular-Leader Instance Matching (FTRLIM), which matches instances between large-scale knowledge graphs with approximately linear time complexity. FTRLIM has participated in OAEI 2019 and achieved the best matching quality with significantly efficiency. In this research, we construct three data collections based on a real-world large-scale knowledge graph. Experiment results on the constructed data collections and two real-world datasets indicate that MultiObJ and FTRLIM outperform other state-of-the-art methods.

Keywords