Russian Journal of Linguistics (Dec 2023)
Verb database: Structure, clusters and options
Abstract
The content and volume of language corpora provide an opportunity to obtain reliable information about the real use of a particular linguistic unit. Nowadays, there is a large number of corpora in different languages, their formation technologies are being improved. Nevertheless, some problems and limitations arise when using these resources in comparative studies. Corpora users need to work with annotated data submitted to tagging through annotation protocols. The article presents the structure and functionality of the supracorpora verb database (SVD) developed on the basis of a parallel Russian-French subcorpus of the Russian National Corpus (RNC) and reveals the difference in their potentials. The described database is a pilot version of the final software, which is currently under development and is being tested. It consists of several clusters focused on solving such linguistic tasks as studying the grammatical semantics specifics and the distribution of verb forms in Russian and French; identifying the polysemantic structure in the two languages, which in turn verifies the understanding of the linguistic worldview of the speakers of Russian and French. It has been found that the mechanism of functioning of SVD cluster formations allows us to study both individual characteristics of verbs and the semantics of verbal lexemes and collocations. The manual annotation enables users to identify the systematic asymmetry of verb forms and cases of contextual and low-frequency asymmetry. Thus, SVD can be used in language pedagogy, teaching and studying discursive grammar, as well as the analysis of translation models variability.
Keywords