IEEE Access (Jan 2024)

Emo-SL Framework: Emoji Sentiment Lexicon Using Text-Based Features and Machine Learning for Sentiment Analysis

  • Manar Alfreihat,
  • Omar Saad Almousa,
  • Yahya Tashtoush,
  • Anas AlSobeh,
  • Khalid Mansour,
  • Hazem Migdady

DOI
https://doi.org/10.1109/ACCESS.2024.3382836
Journal volume & issue
Vol. 12
pp. 81793 – 81812

Abstract

Read online

Recently, given the rise of types of social media networks, the analysis of sentiment and opinions in textual data has gained significant importance. However, sentiment analysis in informal Arabic text presents challenges due to morphological complexities and dialectal variances. This research aims to develop an Emoji Sentiment Lexicon (Emo-SL) tailored to Arabic-language tweets and demonstrate performance improvements by combining emoji-based features with machine learning (ML) for sentiment classification. We constructed the Emo-SL using a corpus of 58K Arabic tweets containing emojis, calculating sentiment scores for 222 frequently occurring emojis based on their distribution across positive and negative categories. Emoji weighting is integrated with text-based feature extraction using lexicons to train classifiers on an Arabic tweet dataset. ML models, including Support Vector Machines (SVM), Naive Bayes, Random Forests, and K-Nearest Neighbors (KNN) are evaluated after optimal preprocessing and normalization. The results show that adding Emo-SL derived emoji features to ML classifiers can significantly improve accuracy by 26.7% over just textual features. The emoji-aware integrated approach achieves 89% F1 score, outperforming the rule-based VADER sentiment analyzer. Additionally, analysis of n-gram impacts further confirms the value of fusing emoji and text semantics for Arabic sentiment classification. The Emo-SL lexicon provides an effective framework for extracting nuanced emotional insights from noisy micro-text, which demonstrates the potential of contextualized emoji understanding to advance multilingual sentiment analysis performance.

Keywords