Comparative Analysis of Word Embeddings in Assessing Semantic Similarity of Complex Sentences

Dhivya Chandrasekaran; Vijay Mago

doi:10.1109/ACCESS.2021.3135807

IEEE Access (Jan 2021)

Comparative Analysis of Word Embeddings in Assessing Semantic Similarity of Complex Sentences

Dhivya Chandrasekaran,
Vijay Mago

Affiliations

Dhivya Chandrasekaran: ORCiD; Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
Vijay Mago: ORCiD; Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada

DOI: https://doi.org/10.1109/ACCESS.2021.3135807
Journal volume & issue: Vol. 9
pp. 166395 – 166408

Abstract

Read online

Semantic textual similarity is one of the open research challenges in the field of Natural Language Processing. Extensive research has been carried out in this field and near-perfect results are achieved by recent transformer-based models in existing benchmark datasets like the STS dataset and the SICK dataset. In this paper, we study the sentences in these datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. In this article, we build a complex sentence dataset comprising of 50 sentence pairs with associated semantic similarity values provided by 15 human annotators. Readability analysis is performed to highlight the increase in complexity of the sentences in the existing benchmark datasets and those in the proposed dataset. Further, we perform a comparative analysis of the performance of various word embeddings and language models on the existing benchmark datasets and the proposed dataset. The results show the increase in complexity of the sentences has a significant impact on the performance of the embedding models resulting in a 10-20% decrease in Pearson’s and Spearman’s correlation.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords