IEEE Access (Jan 2024)

Domain-Enhanced Prompt Learning for Chinese Implicit Hate Speech Detection

  • Yaosheng Zhang,
  • Tiegang Zhong,
  • Tingjun Yi,
  • Haoming Li

DOI
https://doi.org/10.1109/ACCESS.2024.3351804
Journal volume & issue
Vol. 12
pp. 13773 – 13782

Abstract

Read online

Hate Speech Detection, aims to identify the widespread presence of harmful speech on social networks, is a long-standing research field. Despite its significance, previous efforts almost focused on English, leading to a notable scarcity of datasets for Hate Speech Detection in Chinese. Even more, two emerging forms of hate speech under stringent regulatory environments: 1) domain specificity, manifesting itself as nuanced and harder-to-detect proprietary aggressive rhetoric within various domains; and 2) implicitness, characterized by indirect, abstract and ambiguous cold language. This evolution presents additional complexities for Multi-domain Implicit Hate Speech Detection in Chinese. To fill this gap, we construct a 20,000-large implicit hate speech detection dataset containing nine domains. Furthermore, this research introduce a Domain-enhanced Prompt Learning (DePL) approach, tailored to navigate the complexities of multi-domain and data-limited scenarios. This methodology innovatively combines domain feature fusion to effectively encode domain-specific features in hate speech with the latest advances in prompt learning, effectively tackling the dual challenges of domain diversity and data scarcity. Experimental results demonstrate that the DePL method achieves state-of-the-art (SOTA) results on our benchmark dataset in both few-shot and full-scale scenarios.

Keywords