Attention-Based Spatiotemporal-Aware Network for Fine-Grained Visual Recognition

Yili Ren; Ruidong Lu; Guan Yuan; Dashuai Hao; Hongjue Li

doi:10.3390/app14177755

Applied Sciences (Sep 2024)

Attention-Based Spatiotemporal-Aware Network for Fine-Grained Visual Recognition

Yili Ren,
Ruidong Lu,
Guan Yuan,
Dashuai Hao,
Hongjue Li

Affiliations

Yili Ren: Research Institute of Petroleum Exploration and Development, Beijing 100083, China
Ruidong Lu: School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Guan Yuan: School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Dashuai Hao: School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Hongjue Li: School of Astronautics, Beihang University, Beijing 100191, China

DOI: https://doi.org/10.3390/app14177755
Journal volume & issue: Vol. 14, no. 17
p. 7755

Abstract

Read online

On public benchmarks, current macro facial expression recognition technologies have achieved significant success. However, in real-life scenarios, individuals may attempt to conceal their true emotions. Conventional expression recognition often overlooks subtle facial changes, necessitating more fine-grained micro-expression recognition techniques. Different with prevalent facial expressions, weak intensity and short duration are the two main obstacles for perceiving and interpreting a micro-expression correctly. Meanwhile, correlations between pixels of visual data in spatial and channel dimensions are ignored in most existing methods. In this paper, we propose a novel network structure, the Attention-based Spatiotemporal-aware network (ASTNet), for micro-expression recognition. In ASTNet, we combine ResNet and ConvLSTM as a holistic framework (ResNet-ConvLSTM) to extract the spatial and temporal features simultaneously. Moreover, we innovatively integrate two level attention mechanisms, channel-level attention and spatial-level attention, into the ResNet-ConvLSTM. Channel-level attention is used to discriminate the importance of different channels because the contributions for the overall presentation of micro-expression vary between channels. Spatial-level attention is leveraged to dynamically estimate weights for different regions due to the diversity of regions’ reflections to micro-expression. Extensive experiments conducted on two benchmark datasets demonstrate that ASTNet achieves performance improvements of 4.25–16.02% and 0.79–12.93% over several state-of-the-art methods.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords