Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora

Kayode Sakariyau Adewole; Abdullateef Oluwagbemiga Balogun; Muiz Raheem; Muhammed K. Jimoh; Rasheed Gbenga Jimoh; Modinat Abolore Mabayoje; Fatima E. Usman-Hamza; Abimbola Ganiyat Akintola; Ayisat Wuraola Asaju-Gbolagade

doi:10.5455/jjcit.71-1609858713

Jordanian Journal of Computers and Information Technology (Jun 2021)

Hybrid Feature Selection Framework for Sentiment Analysis on Large Corpora

Kayode Sakariyau Adewole,
Abdullateef Oluwagbemiga Balogun,
Muiz Raheem,
Muhammed K. Jimoh,
Rasheed Gbenga Jimoh,
Modinat Abolore Mabayoje,
Fatima E. Usman-Hamza,
Abimbola Ganiyat Akintola,
Ayisat Wuraola Asaju-Gbolagade

Affiliations

Kayode Sakariyau Adewole: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Abdullateef Oluwagbemiga Balogun: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Muiz Raheem: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Muhammed K. Jimoh: Department of Education Technology, University of Ilorin, Ilorin, Nigeria.
Rasheed Gbenga Jimoh: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Modinat Abolore Mabayoje: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Fatima E. Usman-Hamza: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Abimbola Ganiyat Akintola: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
Ayisat Wuraola Asaju-Gbolagade: Department of Computer Science, University of Ilorin, Ilorin, Nigeria.

DOI: https://doi.org/10.5455/jjcit.71-1609858713
Journal volume & issue: Vol. 7, no. 2
pp. 130 – 151

Abstract

Read online

Sentiment analysis has recently drawn considerable research attentions in the recent years owing to its applicability in determining users opinion, sentiment and emotions from large collections of textual data. The goal of sentiment analysis centered on improving users experience by deploying robust techniques that mine opinions and emotions from large corpora. Although there are a number of studies on sentiment analysis and opinion mining from textual information, however, the existence of domain-specific words such as slang, abbreviations and grammatical mistakes further posed serious challenges to existing sentiment analysis methods. Therefore, research efforts have focused on finding the most discriminative attributes that can help in capturing users opinions from textual datasets. In this paper, we focused on identification of effective discriminative subset of features that can aid classification of users opinion from large corpora. This study proposed hybrid feature selection framework that is based on hybridization of filter- and wrapper-based feature selection methods. Correlation feature selection (CFS), a filter-based approach is hybridized with Boruta and Recursive Feature Elimination (RFE), which are wrapper-based feature selection methods, to identify the most discriminative features subsets for sentiment analysis. Four publicly available datasets for sentiment analysis: Amazon, Yelp, IMDB and Kaggle were considered to evaluate the performance of the proposed hybrid feature selection framework. This study evaluated the performance of three classification algorithms: Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) to ascertain the superiority of the proposed approach. Experimental results across different contexts as depicted by the datasets considered in this study clearly showed that CFS combined with Boruta produced promising results especially when the features selected are passed to RF classifier. Indeed, the proposed hybrid framework provide effective way of predicting users opinions and emotions while giving substantial consideration to predictive accuracy [JJCIT 2021; 7(2.000): 130-151]

Published in Jordanian Journal of Computers and Information Technology

ISSN: 2413-9351 (Print); 2415-1076 (Online)
Publisher: Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
Country of publisher: Jordan
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jjcit.org/

About the journal

Abstract

Keywords