Action Recognition Based on Multi-Level Topological Channel Attention of Human Skeleton

Kai Hu; Chaowen Shen; Tianyan Wang; Shuai Shen; Chengxue Cai; Huaming Huang; Min Xia

doi:10.3390/s23249738

Sensors (Dec 2023)

Action Recognition Based on Multi-Level Topological Channel Attention of Human Skeleton

Kai Hu,
Chaowen Shen,
Tianyan Wang,
Shuai Shen,
Chengxue Cai,
Huaming Huang,
Min Xia

Affiliations

Kai Hu: School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
Chaowen Shen: School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
Tianyan Wang: School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
Shuai Shen: School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
Chengxue Cai: School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China
Huaming Huang: Department of Physical Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
Min Xia: School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China

DOI: https://doi.org/10.3390/s23249738
Journal volume & issue: Vol. 23, no. 24
p. 9738

Abstract

Read online

In action recognition, obtaining skeleton data from human poses is valuable. This process can help eliminate negative effects of environmental noise, including changes in background and lighting conditions. Although GCN can learn unique action features, it fails to fully utilize the prior knowledge of human body structure and the coordination relations between limbs. To address these issues, this paper proposes a Multi-level Topological Channel Attention Network algorithm: Firstly, the Multi-level Topology and Channel Attention Module incorporates prior knowledge of human body structure using a coarse-to-fine approach, effectively extracting action features. Secondly, the Coordination Module utilizes contralateral and ipsilateral coordinated movements in human kinematics. Lastly, the Multi-scale Global Spatio-temporal Attention Module captures spatiotemporal features of different granularities and incorporates a causal convolution block and masked temporal attention to prevent non-causal relationships. This method achieved accuracy rates of 91.9% (Xsub), 96.3% (Xview), 88.5% (Xsub), and 90.3% (Xset) on NTU-RGB+D 60 and NTU-RGB+D 120, respectively.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords