MethodsX (Dec 2024)
An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings
Abstract
Event of the disastrous scenarios are actively discussed on microblogging platforms like Twitter which can lead to chaotic situations. In the era of machine learning and deep learning, these chaotic situations can be effectively controlled by developing efficient methods and models that can assist in classifying real and fake tweets. In this research article, an efficient method named BERT Embedding based CNN model with RMSProp Optimizer is proposed to effectively classify the tweets related disastrous scenario. Tweet classification is carried out via some of the popular the machine learning algorithms such as logistic regression and decision tree classifiers. Noting the low accuracy of machine learning models, Convolutional Neural Network (CNN) based deep learning model is selected as the primary classification method. CNNs performance is improved via optimization of the parameters with gradient based optimizers. To further elevate accuracy and to capture contextual semantics from the text data, BERT embeddings are included in the proposed model. The performance of proposed method - BERT Embedding based CNN model with RMSProp Optimizer achieved an F1 score of 0.80 and an Accuracy of 0.83. The methodology presented in this research article is comprised of the following key contributions: • Identification of suitable text classification model that can effectively capture complex patterns when dealing with large vocabularies or nuanced language structures in disaster management scenarios. • The method explores the gradient based optimization techniques such as Adam Optimizer, Stochastic Gradient Descent (SGD) Optimizer, AdaGrad, and RMSprop Optimizer to identify the most appropriate optimizer that meets the characteristics of the dataset and the CNN model architecture. • “BERT Embedding based CNN model with RMSProp Optimizer” – a method to classify the disaster tweets and capture semantic representations by leveraging BERT embeddings with appropriate feature selection is presented and models are validated with appropriate comparative analysis.