Russian Linguistic Bulletin (Sep 2016)

ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА

  • Zheltov, P.V.,
  • Zheltov, V.P.,
  • Gubanov, A.R.

DOI
https://doi.org/10.18454/RULB.7.36
Journal volume & issue
Vol. 2016, no. 3 (7)
pp. 61 – 63

Abstract

Read online

Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following special data structures in the form of a set of classes described in the obtained as a result of operation of search engine components. Text tokenization component converts the text into a set of tokens. To define the rules of tokenization the configuration.

Keywords