Актуальные проблемы филологии и педагогической лингвистики (Mar 2018)
"Linguistic principles and computational linguistics methods for the purposes of sentiment analysis of Russian texts."
Abstract
"The article focuses on the current results of research project aiming at the design of Russian text classifier according to the criterion of text’s emotional tonality. In this paper we discuss linguistic principles and computational linguistics methods basic for our project. Materials and Methods: The research framework integrates theoretical basis of linguistic emotiology and technologies of sentiment analysis. The methodology is axed on Naïve Bayes classifier – the supervised machine- learning algorithm – as one of the most suitable approaches for handling lexical issues in tasks of Natural Language Processing. For text classes feature selection we apply a hybrid methodology using the “bag of words” model and manual linguistic annotation of the data implemented with help of crowdsourcing practice. Results: A feature set is proposed in order to use it for testing different machine learning algorithms aimed to attribute Russian texts to one of nine text classes, such as: texts articulating 1) interest / excitement, 2) enjoyment / joy, 3) surprise, 4) distress / anguish, 5) fear / terror, 6) shame / humiliation, 7) contempt / disgust, 8) anger / rage or 9) “neutral” texts. Eight emotion classes are borrowed from the biological emotion classification of H. Lövheim. The selected features include a rich inventory of linguistic items: emotional lexicon, emotion names, situation based emotional vocabulary and verbal descriptions of emotion behavior manifestations. Conclusions: the design of Russian text classifier according to the criterion of text’s emotional tonality gives the opportunity to rethink some tenets of theoretical linguistics testing them in the practice of applied research."
Keywords