Machine Learning and Knowledge Extraction (Jan 2025)
Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism
Abstract
Transformer models produce advanced text representations that have been used to break through the hard challenge of natural language understanding. Using the Transformer’s attention mechanism, which acts as a language learning memory, trained on tens of billions of words, a word sense disambiguation (WSD) algorithm can now construct a more faithful vectorial representation of the context of a word to be disambiguated. Working with a set of 34 lemmas of nouns, verbs, adjectives and adverbs selected from the National Reference Corpus of Romanian (CoRoLa), we show that using BERT’s attention heads at all hidden layers, we can devise contextual vectors of the target lemma that produce better clusters of lemma’s senses than the ones obtained with standard BERT embeddings. If we automatically translate the Romanian example sentences of the target lemma into English, we show that we can reliably infer the number of senses with which the target lemma appears in the CoRoLa. We also describe an unsupervised WSD algorithm that, using a Romanian BERT model and a few example sentences of the target lemma’s senses, can label the Romanian induced sense clusters with the appropriate sense labels, with an average accuracy of 64%.
Keywords