Coarse-to-Fine Entity Alignment for Chinese Heterogeneous Encyclopedia Knowledge Base

Meng Wu; Tingting Jiang; Chenyang Bu; Bin Zhu

doi:10.3390/fi14020039

Future Internet (Jan 2022)

Coarse-to-Fine Entity Alignment for Chinese Heterogeneous Encyclopedia Knowledge Base

Meng Wu,
Tingting Jiang,
Chenyang Bu,
Bin Zhu

Affiliations

Meng Wu: Ministry of Education Key Laboratory of Knowledge Engineering with Big Data, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
Tingting Jiang: Ministry of Education Key Laboratory of Knowledge Engineering with Big Data, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
Chenyang Bu: Ministry of Education Key Laboratory of Knowledge Engineering with Big Data, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
Bin Zhu: Anhui Province Key Laboratory of Infrared and Low Temperature Plasma, College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China

DOI: https://doi.org/10.3390/fi14020039
Journal volume & issue: Vol. 14, no. 2
p. 39

Abstract

Read online

Entity alignment (EA) aims to automatically determine whether an entity pair in different knowledge bases or knowledge graphs refer to the same entity in reality. Inspired by human cognitive mechanisms, we propose a coarse-to-fine entity alignment model (called CFEA) consisting of three stages: coarse-grained, middle-grained, and fine-grained. In the coarse-grained stage, a pruning strategy based on the restriction of entity types is adopted to reduce the number of candidate matching entities. The goal of this stage is to filter out pairs of entities that are clearly not the same entity. In the middle-grained stage, we calculate the similarity of entity pairs through some key attribute values and matched attribute values, the goal of which is to identify the entity pairs that are obviously not the same entity or are obviously the same entity. After this step, the number of candidate entity pairs is further reduced. In the fine-grained stage, contextual information, such as abstract and description text, is considered, and topic modeling is carried out to achieve more accurate matching. The basic idea of this stage is to use more information to help judge entity pairs that are difficult to distinguish using basic information from the first two stages. The experimental results on real-world datasets verify the effectiveness of our model compared with baselines.

Published in Future Internet

ISSN: 1999-5903 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/futureinternet/

About the journal

Abstract

Keywords