International Journal of Cognitive Computing in Engineering (Jan 2024)

A tweet sentiment classification approach using an ensemble classifier

  • Vidyashree KP,
  • Rajendra AB,
  • Gururaj HL,
  • Vinayakumar Ravi,
  • Moez Krichen

Journal volume & issue
Vol. 5
pp. 170 – 177

Abstract

Read online

Social media users are more receptive to products or events and share their thoughts through raw textual data, which is classified as semi-structured data. This data, which is presented using a variety of terminologies, is noisy by nature but yet contains important information and superfluous details, giving analysts a way to identify patterns and knowledge. This hidden information must be extracted from language data in order to make informed decisions and create strategic plans for entering new markets. Among the most prominent fields of study are natural language processing (NLP) and data mining techniques, especially when it comes to sentiment analysis—the process of identifying the feelings and insights concealed in the data. Twitter is one of the significant microblogging platform with millions of users. These users use Twitter to share sentiments using hash tags on different topics and to make status updates known as tweets. Twitter is therefore regarded as a significant real-time source and as one of the most active opinion indicators. The volume of information is produced by Twitter is enormous and manually scanning the entire data set is difficult process. The paper proposed an ensemble classifier to categorize emotion of the tweets on the basis of polarities such as positive and negative.In our study, we ensemble classifiers which is a combination of Random Forest (RF), Support Vector Machine (SVM) and Decision Tree (DT). The data is collected from Twitter API and the Twitter data is analysed autonomously to define public view on particular topic. The features obtained after the process of dimensionality reduction using LDA undergoes the stage of feature selection using Wrapper based technique. The iterative Wrapper based technique predict score for the features, the features with low score are ignored and high score is proceeded for classification. The ensemble classifier used Adaptive Boosting (AdaBoost) technique where the output from the Machine Learning (ML) classifiers are combined to produce a single output. Adaboost combines the poor classifiers and extracts the prediction value to make a better classifier. The experimental results show that the proposed ensemble classifier provides better accuracy of 93.42 % that is comparatively better than existing Convolutional Bidirectional - Long Short-Term Memory (ConvBiLSTM) classifier and Hybrid Lexicon- Naïve Bayes Classifier (HL-NBC) which produce classification accuracy of 91.53 % and 89.61 % respectively.

Keywords