Multi‐scale skeleton simplification graph convolutional network for skeleton‐based action recognition

Fan Zhang; Ding Chongyang; Kai Liu; Liu Hongjin

doi:10.1049/cvi2.12300

IET Computer Vision (Oct 2024)

Multi‐scale skeleton simplification graph convolutional network for skeleton‐based action recognition

Fan Zhang,
Ding Chongyang,
Kai Liu,
Liu Hongjin

Affiliations

Fan Zhang: School of Computer Science and Technology Xidian University Xi'an China
Ding Chongyang: School of Computer Science and Technology Xidian University Xi'an China
Kai Liu: School of Computer Science and Technology Xidian University Xi'an China
Liu Hongjin: SunWise Space Technology Beijing China

DOI: https://doi.org/10.1049/cvi2.12300
Journal volume & issue: Vol. 18, no. 7
pp. 992 – 1003

Abstract

Read online

Abstract Human action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi‐scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi‐scale skeleton simplification GCN for skeleton‐based action recognition (M3S‐GCN) is proposed for fusing multi‐scale skeleton sequences and modelling the connections between joints. Finally, M3S‐GCN is evaluated on five benchmarks of NTU RGB+D 60 (C‐Sub, C‐View), NTU RGB+D 120 (X‐Sub, X‐Set) and NW‐UCLA datasets. Experimental results show that the authors’ M3S‐GCN achieves state‐of‐the‐art performance with the accuracies of 93.0%, 97.0% and 91.2% on C‐Sub, C‐View and X‐Set benchmarks, which validates the effectiveness of the method.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords