IEEE Access (Jan 2018)
Head to Head: Semantic Similarity of Multi–Word Terms
Abstract
Terms are linguistic signifiers of domain-specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this paper, we use lexico-syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi-word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag-of-word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi-automate taxonomy construction.
Keywords