Proceedings of the XXth Conference of Open Innovations Association FRUCT (Oct 2021)

Machine Learning Methods in the Problem of Attribution of Publicistic Texts of the XIX Century

  • Aleksandr Rogov,
  • Nikolai Moskin,
  • Kirill Kulakov,
  • Roman Abramov

DOI
https://doi.org/10.23919/FRUCT53335.2021.9599961
Journal volume & issue
Vol. 30, no. 1
pp. 223 – 229

Abstract

Read online

We consider in this work linguostatistical methods that were used for attribution (establishing authorship) of publicistic articles of the XIX century. At that time, F. M. Dostoevsky edited and headed three journals: ""Time"", ""Epoch"" and ""Citizen"", where there are about 500 unattributed texts. Samples from texts were compiled, their characteristics were studied, and a comparative analysis of the classification results based on various machine learning methods (decision trees, recurrent networks, parallel recurrent networks, transformer model) was carried out. The input of texts, their processing and the calculation of linguostatistical parameters were carried out using an updated version of the SMALT information system.

Keywords