Exploring transformer models in the sentiment analysis task for the under-resource Bengali language

Md. Nesarul Hoque; Umme Salma; Md. Jamal Uddin; Md. Martuza Ahamad; Sakifa Aktar

Natural Language Processing Journal (Sep 2024)

Exploring transformer models in the sentiment analysis task for the under-resource Bengali language

Md. Nesarul Hoque,
Umme Salma,
Md. Jamal Uddin,
Md. Martuza Ahamad,
Sakifa Aktar

Affiliations

Md. Nesarul Hoque: Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; Corresponding author.
Umme Salma: Department of Computer Science and Engineering, Bangladesh University, Dhaka 1207, Bangladesh
Md. Jamal Uddin: Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
Md. Martuza Ahamad: Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
Sakifa Aktar: Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh

Journal volume & issue: Vol. 8
p. 100091

Abstract

Read online

In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.

Published in Natural Language Processing Journal

ISSN: 2949-7191 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://www.sciencedirect.com/journal/natural-language-processing-journal

About the journal

Abstract

Keywords