Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

Anamaria Radoi; George Cioroiu

doi:10.1109/ACCESS.2024.3450674

IEEE Access (Jan 2024)

Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition

Anamaria Radoi,
George Cioroiu

Affiliations

Anamaria Radoi: ORCiD; Department of Applied Electronics and Information Engineering, NUST Politehnica Bucharest, Bucharest, Romania
George Cioroiu: ORCiD; Department of Applied Electronics and Information Engineering, NUST Politehnica Bucharest, Bucharest, Romania

DOI: https://doi.org/10.1109/ACCESS.2024.3450674
Journal volume & issue: Vol. 12
pp. 120362 – 120374

Abstract

Read online

Emotion recognition is a key research topic in the Affective Computing domain, with implications in marketing, human-robot interaction, and health domains. The continuous technological advances in terms of sensors and the rapid development of artificial intelligence technologies led to breakthroughs and improved the interpretation of human emotions. In this paper, we propose a lightweight neural network architecture that extracts and performs the analysis of multimodal information using the same audio and visual networks across multiple temporal segments. Undoubtedly, data collection and annotation for emotion recognition tasks remain challenging aspects in terms of required expertise and effort spent. In this sense, the learning process of the proposed multimodal architecture is based on an iterative procedure that starts with a small volume of annotated samples and allows a step-by-step improvement of the system by assessing the model uncertainty in recognizing discrete emotions. Specifically, at each epoch, the learning process is guided by the most uncertainly annotated samples and integrates different modes of expressing emotions through a simple augmentation technique. The framework is tested on two publicly available multimodal datasets for emotion recognition, i.e. CREMA-D and RAVDESS, using 5-folds cross-validation. Compared to state-of-the-art methods, the achieved performance demonstrates the effectiveness of the proposed approach, with an overall accuracy of 74.2 % on CREMA-D and 76.3 % on RAVDESS. Moreover, with a small number of model parameters and a low inference time, the proposed neural network architecture represents a valid candidate for the integration on platforms with limited memory and computational resources.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords