3D-STARNET: Spatial–Temporal Attention Residual Network for Robust Action Recognition

Jun Yang; Shulong Sun; Jiayue Chen; Haizhen Xie; Yan Wang; Zenglong Yang

doi:10.3390/app14167154

Applied Sciences (Aug 2024)

3D-STARNET: Spatial–Temporal Attention Residual Network for Robust Action Recognition

Jun Yang,
Shulong Sun,
Jiayue Chen,
Haizhen Xie,
Yan Wang,
Zenglong Yang

Affiliations

Jun Yang: Big Data and Internet of Things Research Center, China University of Mining and Technology, Beijing 100083, China
Shulong Sun: Key Laboratory of Intelligent Mining and Robotics, Ministry of Emergency Management, Beijing 100083, China
Jiayue Chen: Big Data and Internet of Things Research Center, China University of Mining and Technology, Beijing 100083, China
Haizhen Xie: Big Data and Internet of Things Research Center, China University of Mining and Technology, Beijing 100083, China
Yan Wang: Big Data and Internet of Things Research Center, China University of Mining and Technology, Beijing 100083, China
Zenglong Yang: Big Data and Internet of Things Research Center, China University of Mining and Technology, Beijing 100083, China

DOI: https://doi.org/10.3390/app14167154
Journal volume & issue: Vol. 14, no. 16
p. 7154

Abstract

Read online

Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial–Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model’s strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model’s ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords