Engineering Proceedings (Nov 2023)
Optimizable Ensemble Regression for Arousal and Valence Predictions from Visual Features
Abstract
The cognitive state of a person can be categorized using the Circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database, which includes audio, video, and physiological recordings of interactions between human participants to predict arousal and valance values using machine learning techniques. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned or learned. Learned features are automatically learned and utilized by deep learning solutions. Predesigned features are calculated before machine learning and inputted into the learner. Our previous work on video recordings focused on learned features. In this paper, we expand our work onto predesigned visual features, extracted from video recordings. We process these features by applying time delay and sequencing, arousal/valence labelling, and shuffling and splitting. We then train and test regressors to predict arousal and valence values. Our results outperform those from the literature. We achieve a root mean squared error (RMSE), Pearson’s correlation coefficient (PCC), and concordance correlation coefficient (CCC) of 0.1033, 0.8498, and 0.8001 on arousal predictions; and 0.07016, 0.8473, and 0.8053 on valence predictions, using an optimizable ensemble, respectively.
Keywords