Novye Issledovaniâ Tuvy (Dec 2016)

The structure of an entry in the National corpus of Tuvan language

  • Mengi V. Ondar

Journal volume & issue
Vol. 0, no. 4

Abstract

Read online

Contemporary information technologies and mathematical modelling has made creating corpora of natural languages significantly easier. A corpus is an information and reference system based on a collection of digitally processed texts. A corpus includes various written and oral texts in the given language, a set of dictionaries and markup – information on the properties of the text. It is the presence of the markup which distinguishes a corpus from an electronic library. At the moment, national corpora are being set up for many languages of the Russian Federation, including those of the Turkic peoples. Faculty members, postgraduate and undergraduate students at Tuvan State University and Siberian Federal University are working on the National corpus of Tuvan language. This article describes the structure of a dictionary entry in the National corpus of Tuvan language. The corpus database comprises the following tables: MAIN – the headword table, RUS, ENG, GER — translations of the headword into three languages, MORPHOLOGY — the table containing morphological data on the headword. The database is built in Microsoft Office Access. Working with the corpus dictionary includes the following functions: adding, editing and removing an entry, entry search (with transcription), setting and visualizing morphological features of a headword. The project allows us to view the corpus dictionary as a multi-structure entity with a complex hierarchical structure and a dictionary entry as its key component. The corpus dictionary we developed can be used for studying Tuvan language in its pronunciation, orthography and word analysis, as well as for searching for words and collocations in the texts included into the corpus.

Keywords