IEEE Access (Jan 2023)
Fine-Tuning Transformer Models Using Transfer Learning for Multilingual Threatening Text Identification
Abstract
Threatening content detection on social media has recently gained attention. There is very limited work regarding threatening content detection in low-resource languages, especially in Urdu. Furthermore, previous work explored only mono-lingual approaches, and multi-lingual threatening content detection was not studied. This research addressed the task of Multi-lingual Threatening Content Detection (MTCD) in Urdu and English languages by exploiting transfer learning methodology with fine-tuning techniques. To address the multi-lingual task, we investigated two methodologies: 1) Joint multi-lingual, and 2) Joint-translated method. The former approach employs the concept of building a universal classifier for different languages whereas the latter approach applies the translation process to transform the text into one language and then perform classification. We explore the Multilingual Representations for Indian Languages (MuRIL) and Robustly Optimized BERT Pre-Training Approach (RoBERTa) with fine-tuning that already demonstrated state-of-the-art in capturing the contextual and semantic characteristics within the text. For hyper-parameters, manual search and grid search strategies are utilized to find the optimum values. Various experiments are performed on bi-lingual English and Urdu datasets and findings revealed that the proposed methodology outperformed the baselines and showed benchmark performance. The RoBERTa model achieved the highest performance by demonstrating 92% accuracy and 90% macro F1-score with the joint multi-lingual approach.
Keywords