Information (Mar 2019)
Word Sense Disambiguation Studio: A Flexible System for WSD Feature Extraction
Abstract
The paper presents a flexible system for extracting features and creating training and testexamples for solving the all-words sense disambiguation (WSD) task. The system allowsintegrating word and sense embeddings as part of an example description. The system possessestwo unique features distinguishing it from all similar WSD systems—the ability to construct aspecial compressed representation for word embeddings and the ability to construct training andtest sets of examples with different data granularity. The first feature allows generation of data setswith quite small dimensionality, which can be used for training highly accurate classifiers ofdifferent types. The second feature allows generating sets of examples that can be used for trainingclassifiers specialized in disambiguating a concrete word, words belonging to the samepart-of-speech (POS) category or all open class words. Intensive experimentation has shown thatclassifiers trained on examples created by the system outperform the standard baselines formeasuring the behaviour of all-words WSD classifiers.
Keywords