Jisuanji kexue yu tansuo (Sep 2024)
Fusion of Global Enhancement and Local Attention Features for Expression Recognition Network
Abstract
To suppress the effects such as occlusions and posture variations on facial expression recognition in natural scenes, expression recognition network fusing global enhancement and local attention features (GE-LA) is proposed. Firstly, to acquire the enhanced global context information, an enhancement structure of channel-spatial global features is constructed, which uses channel flow module (CFM) and spatial flow module (SFM) to obtain symmetric multi-scale channel semantics and pixel-level spatial semantics, respectively, and combines these two types of semantics to generate global enhanced features. Secondly, to extract local detail features, an efficient channel attention (ECA) mechanism is improved to channel-spatial attention (CSA) mechanism, and a local attention module (LAM) is constructed based on this to obtain channel and spatial high-level semantics. Finally, to enhance the anti-interference ability of the proposed network against factors such as occlusions and posture variations, an adaptive strategy is designed to obtain the weighted fusion of global enhancement features and local attention features, and to achieve expression classification based on the adaptive fusion features. Experimental results on facial expression datasets RAF-DB and FERPlus in natural scenes show that the expression recognition rates of the proposed network are 89.82% and 89.93%, respectively, which are 13.39 percentage points and 10.62 percentage points higher than the baseline network ResNet50. Compared with the related methods, the proposed method, which reduces the influence of occlusions and posture variations, has a better expression recognition performance in natural scenes.
Keywords