BertSent: Transformer-Based Model for Sentiment Analysis of Penta-Class Tweet Classification

Maram Fahaad Almufareh; NZ Jhanjhi; Navid Ali Khan; Saleh Naif Almuayqil; Mamoona Humayun; Danish Javed

doi:10.1109/ACCESS.2024.3515836

IEEE Access (Jan 2024)

BertSent: Transformer-Based Model for Sentiment Analysis of Penta-Class Tweet Classification

Maram Fahaad Almufareh,
NZ Jhanjhi,
Navid Ali Khan,
Saleh Naif Almuayqil,
Mamoona Humayun,
Danish Javed

Affiliations

Maram Fahaad Almufareh: ORCiD; Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakakah, Saudi Arabia
NZ Jhanjhi: ORCiD; School of Computer Science, SCS, Taylor’s University, Subang Jaya, Malaysia
Navid Ali Khan: School of Computer Science, SCS, Taylor’s University, Subang Jaya, Malaysia
Saleh Naif Almuayqil: ORCiD; School of Computer Science, SCS, Taylor’s University, Subang Jaya, Malaysia
Mamoona Humayun: ORCiD; Department of Computing, School of Arts Humanities and Social Sciences, University of Roehampton, London, U.K.
Danish Javed: School of Computer Science, SCS, Taylor’s University, Subang Jaya, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3515836
Journal volume & issue: Vol. 12
pp. 196803 – 196817

Abstract

Read online

Sentiment analysis (SA) is a popular method for obtaining relevant and subjective information from textual content. Sentiment analysis of multimedia material is helpful for various reasons but it is seen as challenging since the messages are often brief, unstructured, and contain linguistic inconsistencies. Previous research on sentiment analysis usually focused on dual or triple-class analysis while using older language modeling techniques. Furthermore, penta-class classification tasks have not been addressed as much. To deal with the challenge, we present a transformer-based model called BertSent that uses ordered preprocessing steps combined with transformer-based tokenization and optimization to get the best sentiment analysis results focused on dealing with limited data. Moreover, our framework handles the challenge of penta-class classification of tweets, and to that end, we combine many preprocessing techniques to fine-tune our transformer-based model. We employ resampling techniques to address class imbalance issues in the penta-class setup which improves model generalization and performance. For that purpose, we incorporate both over-sampling and under-sampling to tackle the challenge of class imbalance when dealing with the penta-class classification problem. Moreover, this article also compares the performance of the transformer-based model against a variety of deep learning-based models, including bi-directional models. The experimentations and results support our model’s remarkable performance considering the limited data and penta-class classification challenge. The results provide an interesting perspective as both under-sampling and oversampling provide similar results. BertSent model combined with over-sampling provides slightly better performance with 75.3% test accuracy in comparison to under-sampling which resulted in 75.1% accuracy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords