IEEE Access (Jan 2024)

Detection of Sarcasm in Urdu Tweets Using Deep Learning and Transformer Based Hybrid Approaches

  • Muhammad Ehtisham Hassan,
  • Masroor Hussain,
  • Iffat Maab,
  • Usman Habib,
  • Muhammad Attique Khan,
  • Anum Masood

DOI
https://doi.org/10.1109/ACCESS.2024.3393856
Journal volume & issue
Vol. 12
pp. 61542 – 61555

Abstract

Read online

Sarcasm has a significant role in human communication especially on social media platforms where users express their sentiments through humor, satire, and criticism. The identification of sarcasm is crucial in comprehending the sentiment and the communication context on platforms like Twitter. This ambiguous nature of the expression of content presents the detection of sarcasm as a considerable challenge in natural language processing (NLP). The importance and challenges increase further, especially in languages like Urdu where resources for NLP are limited. The traditional rule-based approaches lack the desired performance because of the subtle and context-based nature of sarcasm. However, the recent advancements in NLP, particularly the transformer architecture-based large language models (LLMs) like BERT offer promising solutions. In this research, we have utilized a newly created Urdu sarcasm dataset comprising 12,910 tweets manually re-annotated into sarcastic and non-sarcastic classes. These tweets were derived from the public Urdu tweet dataset consisting of 19,995 tweets. We have established baseline results using deep learning classifiers comprising CNN, LSTM, GRU, BiLSTM, and CNN-LSTM. To comprehensively capture the contextual information, we propose a novel hybrid model architecture that integrates multilingual BERT (mBERT) embeddings with BiLSTM and multi-head attention (MHA) for Urdu sarcasm. The proposed mBERT-BiLSTM-MHA model demonstrates superior performance by achieving an accuracy of 79.51% and an F1 score of 80.04%, outperforming deep learning classifiers trained with fastText word embeddings.

Keywords