Emotion Classification in Bangla Text Data Using Gaussian Naive Bayes Classifier: A Computational Linguistic Study

S M Abdullah Shafi; Myesha Samia; Sultanul Arifeen Hamim

Malaysian Journal of Science and Advanced Technology (Sep 2024)

Emotion Classification in Bangla Text Data Using Gaussian Naive Bayes Classifier: A Computational Linguistic Study

S M Abdullah Shafi,
Myesha Samia,
Sultanul Arifeen Hamim

Affiliations

S M Abdullah Shafi: Computer Science Department, American International University-Bangladesh. Dhaka, Bangladesh
Myesha Samia: Computer Science Department, American International University-Bangladesh. Dhaka, Bangladesh
Sultanul Arifeen Hamim: Computer Science Department, American International University-Bangladesh. Dhaka, Bangladesh

Journal volume & issue: Vol. 4, no. 4

Abstract

Read online

Emotion analysis from Bengali text data is challenging due to the intricate structure of the language itself and lack of resource availability tailored to Sentiment Classification. In this paper, the authors have used machine learning algorithms, particularly Gaussian Naive Bayes and Support Vector Machine, for the classification of six emotions in Bengali text. The data is comprehensively pre-processed through segmentation, emoticon handling, removal of stop words, and stemming. It uses feature selection techniques like unigram, bi-gram, and term frequency-inverse document frequency to improve classification accuracy. The main aim of the paper is to present an in-depth analysis of emotion detection in Bengali text, which would be very helpful to scholars working on NLP problems in non-English languages. This research, hence, fills up the gap in emotion analysis research for Bengali text, which has comparatively remained underexplored compared to other languages. The methodology involves dataset preparation, extensive preprocessing, feature extraction with selection, and classification. After rigorous experimentation, the accuracy attained with the GNB classifier is 93.83%, proving the effectiveness of the proposed model in capturing subtle emotional nuances in Bengali text.

Published in Malaysian Journal of Science and Advanced Technology

ISSN: 2785-8901 (Online)
Publisher: Penteract Technology
Country of publisher: Malaysia
LCC subjects: Technology
Website: https://mjsat.com.my

About the journal

Abstract

Keywords