An Efficient Approach for Measuring Semantic Similarity Combining WordNet and Wikipedia

Fei Li; Lejian Liao; Lanfang Zhang; Xinhua Zhu; Bo Zhang; Zheng Wang

doi:10.1109/access.2020.3025611

IEEE Access (Jan 2020)

An Efficient Approach for Measuring Semantic Similarity Combining WordNet and Wikipedia

Fei Li,
Lejian Liao,
Lanfang Zhang,
Xinhua Zhu,
Bo Zhang,
Zheng Wang

Affiliations

Fei Li: ORCiD; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Lejian Liao: ORCiD; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Lanfang Zhang: ORCiD; Faculty of Education, Guangxi Normal University, Guilin, China
Xinhua Zhu: Guangxi Key Laboratory of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China
Bo Zhang: ORCiD; School of Mathematics and Computer Science, Hezhou University, Hezhou, China
Zheng Wang: ORCiD; School of Computer Science and Engineering, Nanyang Technological University, Singapore

DOI: https://doi.org/10.1109/access.2020.3025611
Journal volume & issue: Vol. 8
pp. 184318 – 184338

Abstract

Read online

The measurement of semantic similarity between concepts is an important research topic in natural language processing. In the past, several approaches for measuring the semantic similarity between concepts have been proposed based on WordNet or Wikipedia. However, improvements in the measurement accuracy of most methods have led to a dramatic increase in time complexity, and the existing methods do not effectively integrate WordNet and Wikipedia. In this paper, we focus on designing an efficient semantic similarity method based on WordNet and Wikipedia. To improve the accuracy of WordNet edge-based measures, we propose an edge weight model for combining edge and density information, which assigns a weight to each edge adaptively based on the number of direct hyponyms of the subsumer. Second, to improve the computational efficiencies of the existing Wikipedia link vector-based measures, we propose a new Wikipedia link feature-based semantic similarity method that converts Wikipedia links into semantic knowledge and replaces the TF-IDF statistical weight model in the existing measures. In addition, we propose two new word disambiguation strategies to further improve the accuracy of Wikipedia link-based measures. Finally, to fully exploit the advantages of WordNet and Wikipedia, we propose two new aggregation schemas for combining WordNet “is-a” semantics and Wikipedia link semantics to replace the current aggregation schemas that combine WordNet “is-a” semantics with category semantics in Wikipedia. The experimental results show that our aggregation models are outstanding in terms of accuracy, efficiency and word coverage compared to state-of-the-art similarity measures.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords