Pseudo-Labeling With Large Language Models for Multi-Label Emotion Classification of French Tweets

Usman Malik; Simon Bernard; Alexandre Pauchet; Clement Chatelain; Romain Picot-Clemente; Jerome Cortinovis

doi:10.1109/ACCESS.2024.3354705

IEEE Access (Jan 2024)

Pseudo-Labeling With Large Language Models for Multi-Label Emotion Classification of French Tweets

Usman Malik,
Simon Bernard,
Alexandre Pauchet,
Clement Chatelain,
Romain Picot-Clemente,
Jerome Cortinovis

Affiliations

Usman Malik: Université de Rouen Normandie, LITIS UR 4108, Rouen, France
Simon Bernard: ORCiD; Université de Rouen Normandie, LITIS UR 4108, Rouen, France
Alexandre Pauchet: INSA Rouen Normandie, LITIS UR 4108, Rouen, France
Clement Chatelain: ORCiD; INSA Rouen Normandie, LITIS UR 4108, Rouen, France
Romain Picot-Clemente: Saagie, Rouen, France
Jerome Cortinovis: Atmo Normandie, Rouen, France

DOI: https://doi.org/10.1109/ACCESS.2024.3354705
Journal volume & issue: Vol. 12
pp. 15902 – 15916

Abstract

Read online

This study proposes a novel semi-supervised multi-label emotion classification approach for French tweets based on pseudo-labeling. Human subjectivity in emotional expression makes it difficult for a machine to learn. Therefore, it necessitates training supervised learning models on large datasets annotated by multiple annotators. However, creating such datasets can be costly and time-consuming. Moreover, aggregating annotations from multiple annotators to capture their collective emotional state adds complexity to the task. Semi-supervised learning techniques have shown effectiveness with limited datasets. Furthermore, Large language Models (LLMs), particularly Chat-GPT, have demonstrated superior annotation accuracy compared to annotations obtained from crowdsourcing platforms, when both are evaluated against expert-annotated data. This work introduces a novel approach for multi-label emotion classification of French tweets by leveraging pseudo-labels generated through Chat-GPT, a robust large language model. Using zero-shot, one-shot, and few-shot learning techniques, Chat-GPT annotates the unlabelled part of our dataset. These Chat-GPT-annotated pseudo-labels are then merged with manual annotations, facilitating the training of a multi-label emotion classification model via semi-supervised learning. Furthermore, within the context of our research, we present a new French tweet dataset, containing testimonials from people affected by an urban industrial incident. This dataset features 2,350 tweets, each manually annotated by three human annotators based on 8 pre-identified emotions. Benchmark results are presented for multi-label emotion classification models employing both fully supervised and semi-supervised approaches with pseudo-labeling. Our findings demonstrate the superiority of our approach for multi-label emotion classification over standard pseudo-labeling and an established baseline.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords