IEEE Access (Jan 2024)

IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection

  • Derwin Suhartono,
  • Wilson Wongso,
  • Alif Tri Handoyo

DOI
https://doi.org/10.1109/ACCESS.2024.3416955
Journal volume & issue
Vol. 12
pp. 87323 – 87332

Abstract

Read online

Sarcasm detection in the Indonesian language poses a unique set of challenges due to the linguistic nuances and cultural specificities of the Indonesian social media landscape. Understanding the dynamics of sarcasm in this context requires a deep dive into language patterns and the socio-cultural background that shapes the use of sarcasm as a form of criticism and expression. In this study, we developed the first publicly available Indonesian sarcasm detection benchmark datasets from social media texts. We extensively investigated the results of classical machine learning algorithms, pre-trained language models, and recent large language models (LLMs). Our findings show that fine-tuning pre-trained language models is still superior to other techniques, achieving F1 scores of 62.74% and 76.92% on the Reddit and Twitter subsets respectively. Further, we show that recent LLMs fail to perform zero-shot classification for sarcasm detection and that tackling data imbalance requires a more sophisticated data augmentation approach than our basic methods.

Keywords