Cross-Domain Sentiment Analysis Based on Small in-Domain Fine-Tuning

Anastasia V. Kotelnikova; Sergey V. Vychegzhanin; Evgeny V. Kotelnikov

doi:10.1109/ACCESS.2023.3269720

IEEE Access (Jan 2023)

Cross-Domain Sentiment Analysis Based on Small in-Domain Fine-Tuning

Anastasia V. Kotelnikova,
Sergey V. Vychegzhanin,
Evgeny V. Kotelnikov

Affiliations

Anastasia V. Kotelnikova: Department of Applied Mathematics and Computer Science, Vyatka State University, Kirov, Russia
Sergey V. Vychegzhanin: ORCiD; Department of Applied Mathematics and Computer Science, Vyatka State University, Kirov, Russia
Evgeny V. Kotelnikov: ORCiD; Department of Applied Mathematics and Computer Science, Vyatka State University, Kirov, Russia

DOI: https://doi.org/10.1109/ACCESS.2023.3269720
Journal volume & issue: Vol. 11
pp. 41061 – 41074

Abstract

Read online

Significant progress has been made in sentiment analysis over the past few years, especially due to the application of deep neural language models. However, there is a problem of transferability of trained models from one domain to another, especially for less studied languages such as Russian. We propose an approach to build cross-domain sentiment analysis models based on a two-stage procedure: first, we fine-tune a pre-trained RuBERT language model on a combined non-domain corpus, and then fine-tune this model on a small domain corpus. We conducted large-scale experiments with 30 sentiment annotated corpora across 12 domains. In order to increase the representativeness of news texts with high-quality annotation, we created a novel RuNews corpus, containing 1,823 news articles annotated by sentiment. The results show that fine-tuning the model using a small number (about several hundred) of annotated domain texts can significantly improve the performance of sentiment analysis for a new domain (on average by 4.6 p.p.). We also obtained the state-of-the-art results for 7 out of 14 test corpora.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords