Open Computer Science (Aug 2016)
An Optimized Lesk-Based Algorithm for Word Sense Disambiguation
Abstract
Computational complexity is a characteristic of almost all Lesk-based algorithms for word sense disambiguation (WSD). In this paper, we address this issue by developing a simple and optimized variant of the algorithm using topic composition in documents based on the theory underlying topic models. The knowledge resource adopted is the English WordNet enriched with linguistic knowledge from Wikipedia and Semcor corpus. Besides the algorithm’s eficiency, we also evaluate its efectiveness using two datasets; a general domain dataset and domain-specific dataset. The algorithm achieves a superior performance on the general domain dataset and superior performance for knowledge-based techniques on the domain-specific dataset.
Keywords