ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА

Zheltov, P.V.; Zheltov, V.P.; Gubanov, A.R.

doi:10.18454/RULB.7.36

Russian Linguistic Bulletin (Sep 2016)

ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА

Zheltov, P.V.,
Zheltov, V.P.,
Gubanov, A.R.

Affiliations

Zheltov, P.V.: Chuvash State University named after I.N. Ulyanov
Zheltov, V.P.: Chuvash State University named after I.N. Ulyanov
Gubanov, A.R.: Chuvash State University named after I.N. Ulyanov

DOI: https://doi.org/10.18454/RULB.7.36
Journal volume & issue: Vol. 2016, no. 3 (7)
pp. 61 – 63

Abstract

Read online

Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following special data structures in the form of a set of classes described in the obtained as a result of operation of search engine components. Text tokenization component converts the text into a set of tokens. To define the rules of tokenization the configuration.

Published in Russian Linguistic Bulletin

ISSN: 2313-0288 (Print); 2411-2968 (Online)
Publisher: Marina Sokolova Publishings
Country of publisher: Russian Federation
LCC subjects: Language and Literature: Philology. Linguistics
Website: http://rulb.org/

About the journal

Abstract

Keywords