Applied Sciences (Aug 2020)

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

  • Andraž Pelicon,
  • Marko Pranjić,
  • Dragana Miljković,
  • Blaž Škrlj,
  • Senja Pollak

DOI
https://doi.org/10.3390/app10175993
Journal volume & issue
Vol. 10, no. 17
p. 5993

Abstract

Read online

In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training.

Keywords