Benchmarking a large Twitter dataset for Arabic emotion analysis

Ahmed El-Sayed; Mohamed Abougabal; Shaimaa Lazem

doi:10.1007/s42452-023-05437-1

SN Applied Sciences (Jul 2023)

Benchmarking a large Twitter dataset for Arabic emotion analysis

Ahmed El-Sayed,
Mohamed Abougabal,
Shaimaa Lazem

Affiliations

Ahmed El-Sayed: Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University
Mohamed Abougabal: Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University
Shaimaa Lazem: City of Scientific Research and Technological Applications

DOI: https://doi.org/10.1007/s42452-023-05437-1
Journal volume & issue: Vol. 5, no. 8
pp. 1 – 13

Abstract

Read online

Abstract The scarcity of available annotated Arabic language emotion datasets limits the effectiveness of emotion detection applications. Techniques such as semi-supervised self-learning annotation and transfer learning from models trained on large annotated datasets have been increasingly considered as alternative economic options for researchers working on Arabic sentiment and emotion detection tasks. Examining the quality of the data annotated using these techniques is particularly important in applications that require detecting emotions with high granularity such as mental health applications. This paper contributes an approach to benchmarking a semi-supervised self-learning annotated Arabic emotion large dataset. By extracting the lexical correlation of each emotion, and conducting content analysis, the quality of the annotation approach is demonstrated. Further, using a comprehensive set of experiments, we evidence the effectiveness of the transfer learning approach from the large dataset to smaller datasets in emotion and sentiment classification tasks.

Published in SN Applied Sciences

ISSN: 2523-3963 (Print); 2523-3971 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science; Technology
Website: https://www.springer.com/snas

About the journal

Abstract

Keywords