Journal of Information and Telecommunication (Apr 2024)
Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification
Abstract
ABSTRACTThe class imbalance problem, in which the distribution of different classes in training data is unequal or skewed, is a prevailing problem. This can lead to classifier algorithms being biased, negatively impacting the performance of the minority class. In this paper, we addressed the class imbalance problem in datasets for aspect-based sentiment classification. Aspect-based Sentiment Classification (AbSC) is a type of fine-grained sentiment analysis in which sentiments about particular aspects of an entity are extracted. In this work, we addressed the issue of class imbalance by creating synthetic data. For synthetic data generation, two techniques have been proposed: paraphrasing using the PEGASUS fine-tuned model and backtranslation using the M2M100 neural machine translation model. We compared these techniques with two other class balancing techniques, such as weighted oversampling and cross-entropy loss with class weight. An extensive experimental study has been conducted on three benchmark datasets for restaurant reviews: SemEval-2014, SemEval-2015, and SemEval-2016. We applied these methods to the BERT-based deep learning model for aspect-based sentiment classification and studied the effect of balancing the data on the performance of these models. Our proposed balancing technique, using synthetic data, yielded better results than the other two existing methods for dealing with multi-class imbalance.
Keywords