IEEE Access (Jan 2022)
Privacy Aware Affective State Recognition From Visual Data
Abstract
Affective state recognition is a key component of any system equipped with emotional awareness and intelligence. The ability of recognizing emotions allows the machine to better understand the user requirements, and guide its decision and response, thus establishing a more connected relationship with the human. It is usually assumed that emotion recognition is mainly determined by face features from visual data, which impose the dilemma of invading the privacy of the user and capturing their identity, which is unacceptable by many people, especially in public human-machine interaction (HMI) setups. On the other hand, bodily reactions and background context can provide enough emotional clues visually and are less susceptible to contextual influences compared to facial expression. Consequently, this paper investigate the recognition of affective state from visual data captured during a naturalistic conversation with similar perspective to HMI. The faces were masked to conceal the identity of the users. A deep learning recognition model based on a combined Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) architecture is employed to classify the user’s affective state into two levels of arousal and valence, as well as their quadrant combinations. The experiments were conducted using two different labeling schemes mimicking the self and conversation partner perspectives. The results shows that affective state recognition from masked data using the proposed model can achieve comparable performances (up to 96.82%, 95.91%, and 91.52% for arousal, valence, and quad classes recognition, respectively) in comparison to the use of raw data with facial expressions. This paves the way for privacy aware emotion recognition systems that could be widely accepted by the users.
Keywords