How can we manage Offensive Text in Social Media - A Text Classification Approach using LSTM-BOOST

Md. Anwar Hussen Wadud; Muhammad Mohsin Kabir; M.F. Mridha; M. Ameer Ali; Md. Abdul Hamid; Muhammad Mostafa Monowar

International Journal of Information Management Data Insights (Nov 2022)

How can we manage Offensive Text in Social Media - A Text Classification Approach using LSTM-BOOST

Md. Anwar Hussen Wadud,
Muhammad Mohsin Kabir,
M.F. Mridha,
M. Ameer Ali,
Md. Abdul Hamid,
Muhammad Mostafa Monowar

Affiliations

Md. Anwar Hussen Wadud: Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka, Bangladesh
Muhammad Mohsin Kabir: Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka, Bangladesh
M.F. Mridha: Department of Computer Science & Engineering, American International University Bangladesh Corresponding Author's Secondary, Dhaka, Bangladesh; Corresponding author.
M. Ameer Ali: Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka, Bangladesh
Md. Abdul Hamid: Department of Information Technology, Faculty of Computing & Information Technology, King Abdulaziz University, Jeddah-21589, Kingdom of Saudi Arabia
Muhammad Mostafa Monowar: Department of Information Technology, Faculty of Computing & Information Technology, King Abdulaziz University, Jeddah-21589, Kingdom of Saudi Arabia

Journal volume & issue: Vol. 2, no. 2
p. 100095

Abstract

Read online

Recently, offensive content has become increasingly popular for harassing and criticizing people on numerous social media platforms. This paper proposes an offensive text classification algorithm named LSTM-BOOST employing Long Short-Term Memory(LSTM) model with ensemble learning to recognize offensive Bengali texts in various social media platforms. The proposed LSTM-BOOST model uses the modified AdaBoost algorithm employing principal component analysis(PCA) along with LSTM networks. In the LSTM-Boost model, the dataset is divided into three categories, and PCA and LSTM networks are applied to each part of the dataset to obtain the most significant variance and reduce the weighted error of the weak hypothesis of the model. Furthermore, different classifiers are used for baseline experiment and the model is evaluated on various word embedding vector methods. Our investigation found that the LSTM-BOOST algorithms outperform most of the baseline architecture, leading F1-score of 92.61% on the Bengali offensive text from Social Platforms(BHSSP) dataset.

Published in International Journal of Information Management Data Insights

ISSN: 2667-0968 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.journals.elsevier.com/international-journal-of-information-management-data-insights

About the journal

Abstract

Keywords