Eesti Rakenduslingvistika Ühingu Aastaraamat (May 2016)

Eesti keele ühendverbide kompositsionaalsuse määramine

  • Eleri Aedmaa

DOI
https://doi.org/10.5128/ERYa12.01
Journal volume & issue
Vol. 12
pp. 5 – 23

Abstract

Read online

The purposes of this article are to automatically classify Estonian particle verbs and detect their degree of compositionality. In order to group particle verbs, the lexical association measures (AMs) are compared. For the detection of the degree of compositionality of Estonian particle verbs, a model based on distributional semantics is used. The experiment is carried out with the word2vec tool, using a continuous bag-of-words model which predicts the word given its context. The analysis of the comparison of AMs revealed that none of the AMs used achieve high enough precision values to classify the particle verbs. Hence, it can be assumed that Estonian particle verbs cannot be divided cleanly into the classes of compositional and non-compositional particle verbs, but rather populate a continuum between entirely compositional and entirely non-compositional expressions. The experiment of assessing the degree of compositionality of the particle verbs using distributional semantic model proved successful. It is demonstrated that the value of cosine similarity can predict the degree of compositionality of particle verbs. However, in order to evaluate the method introduced here, it is important to create a ranking of human judgement on semantic compositionality for a series of particle verbs and base verbs to which they correspond.

Keywords