IEEE Access (Jan 2021)

SCEP—A New Image Dimensional Emotion Recognition Model Based on Spatial and Channel-Wise Attention Mechanisms

  • Bo Li,
  • Hui Ren,
  • Xuekun Jiang,
  • Fang Miao,
  • Feng Feng,
  • Libiao Jin

DOI
https://doi.org/10.1109/ACCESS.2021.3057373
Journal volume & issue
Vol. 9
pp. 25278 – 25290

Abstract

Read online

Images are an important carrier for emotional expression. Human can understand emotions in image easily and quickly, whereas it is a very challenging task for machines to extract accurate emotions. In this study, we propose a novel spatial and channel-wise attention-based emotion prediction model, SCEP, to assist computers in recognizing the emotions of images more accurately. SCEP integrates both spatial attention and channel-wise weight mechanisms into a classical convolutional neural network (CNN) layer structure to predict image emotions, on the grounds that the spatial attention mechanism can enhance the contrast between salient regions and potentially irrelevant regions, and that the channel-wise weight mechanism can emphasize informative features while suppressing less useful features. The SCEP model outputs emotion values in a continuous 2-D valence and arousal space, so that more emotions can be expressed than by simply discretely classifying emotions. To validate the effectiveness of our model, we use an existing image dataset with a widespread emotion distribution for testing. Extensive experiments show that when compared to base models (i.e. VGG and ResNet) without spatial attention or channel-wise mechanisms, SCEP can improve the accuracy of emotion prediction (evaluated by concordance correlation coefficient) by ~ 3%-5% in the arousal domain, and by ~ 3-6% in the valence domain. Therefore, we conclude that using SCEP can bring higher accuracy in emotion prediction.

Keywords