Novye Issledovaniâ Tuvy (Dec 2016)
A metatextual markup in the national corpus of Tuvan language: the structure and functionality
Abstract
Creating natural language corpora helps solve a number of philological and purely linguistic problems for many languages of the peoples of Russian Federation. National corpus of Tuvan language (http://www.tuvancorpus.ru/) is one of such products jointly developed by faculty and students at two universities in Krasnoyarsk and Kyzyl. The article presents a meta-markup system which forms the most important part of the search functionality in any corpus. Meta-markup refers to assigning parameters characterizing the text as a whole. Within a corpus, meta-markup provides the opportunity to search and select texts to include them into subcorpora by the presence of a certain feature(s). Consequently, the larger the set of such features is for each text, the wider become the search functionality for various philological and linguistic purposes. The meta-markup system for the texts included into the National corpus of Tuvan language may include up to 18 parameters, such as the author’s name and gender, the title and creation date (year) of the text, its functional sphere, topic, subject area, time and setting of events described in it, the text’s classification by type of spoken language or literary genre and style, its source, name of the periodical it appeared in, publisher, publication date, medium, comments, as well as some features of its audience, such as age and education level.