IEEE Access (Jan 2023)

Cognitive Relationship-Based Approach for Urdu Sarcasm and Sentiment Classification

  • Muhammad Yaseen Khan,
  • Tafseer Ahmed,
  • Muhammad Shoaib Siddiqui,
  • Shaukat Wasi

DOI
https://doi.org/10.1109/ACCESS.2023.3325048
Journal volume & issue
Vol. 11
pp. 126661 – 126690

Abstract

Read online

Humans have a natural tendency to express their emotions, but they are also skilled at using sarcasm to shape their feelings. In cognitive computing and natural language processing research, sentiment analysis and sarcasm detection are typically treated as separate tasks, with each text analyzed in isolation. However, this approach overlooks the connection between sentiment and sarcasm. We believe that sentiment and sarcasm are closely related and should be analyzed together to achieve a better understanding of context and natural language. In this paper, we propose a new framework that leverages the Cognitive Relationship (CR) between sarcasm and sentiment to improve the accuracy of classification. By taking into account the relationship between these two factors, we can achieve better results in sentiment analysis and sarcasm detection. We have also created a new and nearly balanced dataset for sentiment and sarcasm classification in standard Urdu that contains 7,000 tweets, which make up over 210K tokens. To gain a better understanding of the data, we conducted exploratory data analysis on words, hashtags, and emojis. The proposed methodology conducted a variety of classical machine learning classifiers and tested them with different variations of the dataset. After a thorough analysis of the results and errors, we found that the CR-based approach for sarcasm and sentiment classification performed better than the traditional stand-alone (SA) approach. Among the classifiers, Linear Regression and eXtreme Gradient Boosting proved to be the most effective. The sentiment classification based on CR has demonstrated a 9.3% enhancement compared to the stand-alone (SA) method while maintaining an overall improvement of approximately 22% compared to the baseline distribution. In the same way, the sarcasm classification based on CR has shown a 9.1% improvement over the SA approach and approximately 23.6% improvement over the baseline distribution.

Keywords