Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism

Radu Ion; Vasile Păiș; Verginica Barbu Mititelu; Elena Irimia; Maria Mitrofan; Valentin Badea; Dan Tufiș

doi:10.3390/make7010010

Machine Learning and Knowledge Extraction (Jan 2025)

Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism

Radu Ion,
Vasile Păiș,
Verginica Barbu Mititelu,
Elena Irimia,
Maria Mitrofan,
Valentin Badea,
Dan Tufiș

Affiliations

Radu Ion: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania
Vasile Păiș: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania
Verginica Barbu Mititelu: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania
Elena Irimia: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania
Maria Mitrofan: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania
Valentin Badea: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania
Dan Tufiș: Research Institute for Artificial Intelligence “Mihai Drăgănescu”, “Calea 13 Septembrie”, 050711 Bucharest, Romania

DOI: https://doi.org/10.3390/make7010010
Journal volume & issue: Vol. 7, no. 1
p. 10

Abstract

Read online

Transformer models produce advanced text representations that have been used to break through the hard challenge of natural language understanding. Using the Transformer’s attention mechanism, which acts as a language learning memory, trained on tens of billions of words, a word sense disambiguation (WSD) algorithm can now construct a more faithful vectorial representation of the context of a word to be disambiguated. Working with a set of 34 lemmas of nouns, verbs, adjectives and adverbs selected from the National Reference Corpus of Romanian (CoRoLa), we show that using BERT’s attention heads at all hidden layers, we can devise contextual vectors of the target lemma that produce better clusters of lemma’s senses than the ones obtained with standard BERT embeddings. If we automatically translate the Romanian example sentences of the target lemma into English, we show that we can reliably infer the number of senses with which the target lemma appears in the CoRoLa. We also describe an unsupervised WSD algorithm that, using a Romanian BERT model and a few example sentences of the target lemma’s senses, can label the Romanian induced sense clusters with the appropriate sense labels, with an average accuracy of 64%.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords