Intelligent Systems with Applications (Feb 2023)
Synset2Node: A new synset embedding based upon graph embeddings
Abstract
Due to the advances made in recent years, embedding methods caused a significant increase in the accuracy of text or graph processing methods. Embedding methods exhibit a compact vector representation of the basic elements (words, synsets, nodes,..) of the underlying system to encode the semantic information between the elements. Of course, due to the polysemous nature of words, in some NLP tasks, the use of sense/synset embedding is better than word embedding. However, in the literature, the introduction of embedding for synsets has received less attention. Existing synset embedding methods have complex calculations to calculate synset embedding based on word embeddings or base upon a defined pairwise synset similarity. In this paper, considering the graphical structure of the WordNet and the high-level knowledge encoded in it, we will create a synset embedding directly from the WordNet graph and its synset relations. Node2Vec graph embedding is used to map nodes of this graph to a vector space. We evaluate the performance of different graph structures (e.g. weighted/weightless, directed/undirected graphs). Moreover, we propose a weighting strategy to weight different synset relation types in the resulting WordNet graph. Experimental results of evaluation of the proposed synset embedding on the task of measuring lexical semantic similarities shows that mean squared error of similarities for the proposed synset embedding method on MEM and WordSim353 datasets are 0.065 and 0.035, resp., which is better than the mean squared error of Word2Vec on these datasets, (0.073 and 0.045, resp.). Furthermore, we use the Pearson correlation and Spearman correlation to compare the performance of the proposed synset embedding method with the state-of-the-art ones. The obtained results show the efficiency of the proposed method on various datasets. .The spearman correlation of the SimLex999 is improved by 0.02, while it improves WordSim353 Pearson correlation by 0.14.