Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach

Cici Suhaeni; Hwan-Seung Yong

doi:10.3390/app14020622

Applied Sciences (Jan 2024)

Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach

Cici Suhaeni,
Hwan-Seung Yong

Affiliations

Cici Suhaeni: Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Republic of Korea
Hwan-Seung Yong: Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Republic of Korea

DOI: https://doi.org/10.3390/app14020622
Journal volume & issue: Vol. 14, no. 2
p. 622

Abstract

Read online

This study addresses the challenge of class imbalance in sentiment analysis by utilizing synthetic data to balance training datasets. We introduce an innovative approach using the GPT-3 model’s sentence-by-sentence generation technique to generate synthetic data, specifically targeting underrepresented negative and neutral sentiments. Our method aims to align these minority classes with the predominantly positive sentiment class in a Coursera course review dataset, with the goal of enhancing the performance of sentiment classification. This research demonstrates that our proposed method successfully enhances sentiment classification performance, as evidenced by improved accuracy and F1-score metrics across five deep-learning models. However, when compared to our previous research utilizing fine-tuning techniques, the current method shows a relative shortfall. The fine-tuning approach yields better results in all models tested, indicating the importance of data novelty and diversity in synthetic data generation. In terms of the deep-learning model used for classification, the notable finding is the significant performance improvement of the Recurrent Neural Network (RNN) model compared to other models like CNN, LSTM, BiLSTM, and GRU, highlighting the impact of the model choice and architecture depth. This study emphasizes the critical role of synthetic data quality and strategic deep-learning model implementation in sentiment analysis. The results suggest that the careful consideration of training data and model attributes is vital for optimal sentiment classification.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords