Data Science Journal (Aug 2019)

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

  • Aleksandr Romanov,
  • Konstantin Lomotin,
  • Ekaterina Kozlova

DOI
https://doi.org/10.5334/dsj-2019-037
Journal volume & issue
Vol. 18, no. 1

Abstract

Read online

This work is devoted to the study of applicability of modern methods of machine learning to the task of automatic classification of scientific articles and abstracts. For this purpose, the study of such models of machine learning as artificial neural networks, random forest, logistic regression, and support vector machine was carried out with taking into account such a feature of scientific texts as a large number of terms specific for various categories. Separately, the stages of data collection and extraction of text characteristics are considered. The results of research are used in development of a decision support system for assignment of scientific texts to the code of the department or abstract journal of All-Russian Institute of Scientific and Technical Information of Russian Academy of Sciences.

Keywords