IEEE Access (Jan 2020)
Multiple Attention Network for Facial Expression Recognition
Abstract
One key challenge in facial expression recognition (FER) is the extraction of discriminative features from critical facial regions. Because of their promising ability to learn discriminative features, visual attention mechanisms are increasingly used to address pattern recognition problems. This paper presents a novel multiple attention network that simulates humans' coarse-to-fine visual attention to improve expression recognition performance. In the proposed network, a region-aware sub-net (RASnet) learns binary masks for locating expression-related critical regions with coarse-to-fine granularity levels and an expression recognition sub-net (ERSnet) with a multiple attention (MA) block learns comprehensive discriminative features. Embedded in the convolutional layers, the MA block fuses diversified attention using the learned masks from the RASnet. The MA block contains a hybrid attention branch with a series of sub-branches, where each sub-branch provides region-specific attention. To explore the complementary benefits of diversified attention, the MA block also has a weight learning branch that adaptively learns the contributions of the different critical regions. Experiments have been carried out on two publicly available databases, RAF and CK+, and the reported accuracies are 85.69% and 96.28%, respectively. The results indicate that our method achieves competitive or better performance than state-of-the-art methods.
Keywords