IEEE Access (Jan 2020)
Weakly Supervised Local-Global Attention Network for Facial Expression Recognition
Abstract
Combining global and local features is an essential solution to improve discriminative performances in facial expression recognition tasks. The limitations of existing methods are that they cannot extract crucial local features and ignore the complementary effects of local and global features. To address these problems, this paper proposes a Weakly Supervised Local-Global Attention Network (WS-LGAN), which uses the attention mechanism to deal with part location and feature fusion problems. Firstly, an Attention Map Generator is designed to get a set of attention maps under weak supervision. It mimics the attention mechanism of human brain and quickly finds the local regions-of-interest. Secondly, bilinear attention pooling is employed to generate and refine local features based on attention maps. Thirdly, a building block called Selective Feature Unit is designed. It allows adaptive weighted fusion of global and local features before making classification. In WS-LGAN, global and local features represent expressions from different aspects. Compared with methods relying on single type of feature, it benefits from local-global complementary advantages. Additionally, contrastive loss is introduced for both local and global features to increase inter-class dispersion and intra-class compactness under different granularities. Experiments on three popular facial expression datasets, including two lab-controlled facial expression datasets and one real-world facial expression dataset show that WS-LGAN achieves state-of-the-art performance, which demonstrates our superiority in facial expression recognition.
Keywords