Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2022)

Topical Text Classification of Russian News: a Comparison of BERT and Standard Models

  • Ksenia Lagutina

DOI
https://doi.org/10.23919/FRUCT54823.2022.9770920
Journal volume & issue
Vol. 31, no. 1
pp. 160 – 166

Abstract

Read online

The paper is devoted to the single-label topical classification of Russian news. The author compares the BERT features and standard character, word and structure-level features as text models. Experiments with OpenCorpora show that the BERT model is superior to standard ones, and achieves good classification quality for a small dataset of long news. Comparison with the state-of-the-art research allows to consider BERT as a baseline for future investigations of analysis of texts in Russian.

Keywords