IEEE Access (Jan 2020)

Fine-Tuning BERT for Multi-Label Sentiment Analysis in Unbalanced Code-Switching Text

  • Tiancheng Tang,
  • Xinhuai Tang,
  • Tianyi Yuan

DOI
https://doi.org/10.1109/access.2020.3030468
Journal volume & issue
Vol. 8
pp. 193248 – 193256

Abstract

Read online

Previous research on sentiment analysis mainly focuses on binary or ternary sentiment analysis in monolingual texts. However, in today's social media such as micro-blogs, emotions are often expressed in bilingual or multilingual text called code-switching text, and people's emotions are complex, including happiness, sadness, angry, afraid, surprise, etc. Different emotions may exist together, and the proportion of each emotion in the code-switching text is often unbalanced. Inspired by the recently proposed BERT model, we investigate how to fine-tune BERT for multi-label sentiment analysis in code-switching text in this paper. Our investigation includes the selection of pre-trained models and the fine-tuning methods of BERT on this task. To deal with the problem of the unbalanced distribution of emotions, a method based on data augmentation, undersampling and ensemble learning is proposed to get balanced samples and train different multi-label BERT classifiers. Our model combines the prediction of each classifier to get the final outputs. The experiment on the dataset of NLPCC 2018 shared task 1 shows the effectiveness of our model for the unbalanced code-switching text. The F1-Score of our model is higher than many previous models.

Keywords