Two‐stream spatiotemporal networks for skeleton action recognition

Lei Wang; Jianwei Zhang; Shanmin Yang; Song Gu

doi:10.1049/ipr2.12868

IET Image Processing (Sep 2023)

Two‐stream spatiotemporal networks for skeleton action recognition

Lei Wang,
Jianwei Zhang,
Shanmin Yang,
Song Gu

Affiliations

Lei Wang: School of Aeronautics and Astronautics Sichuan University ChengduChina
Jianwei Zhang: College of Computer Science Sichuan University ChengduChina
Shanmin Yang: School of Computer Science Chengdu University of Information Technology ChengduChina
Song Gu: School of Aeronautical Manufacturing Industry Chengdu Aeronautic Vocational and Technical College ChengduChina

DOI: https://doi.org/10.1049/ipr2.12868
Journal volume & issue: Vol. 17, no. 11
pp. 3358 – 3370

Abstract

Read online

Abstract Skeleton‐based neural networks have been considered a focus for human action recognition (HAR). It is noteworthy that the existing skeleton‐based methods are not capable of combining the spatial and temporal features reasonably to derive more effective high‐level representations, and it continues to be a challenging task of learning and representing the skeleton action discriminatively. In this study, a novel two‐stream spatiotemporal network (TSTN) is proposed, which is capable of processing the spatial and temporal features respectively and collectively to achieve a better representation and understanding of human action. The temporal branch stacks three gate recurrent unit (GRU) blocks in a new architecture to encode the temporal correlations from different aspects of human action, achieving high‐level temporal semantic feature expressions. The spatial branch encodes the spatial features with multi‐stacked graph convolutional network (GCN) blocks. Self‐attention mechanisms incorporated with the graph structure of the skeleton are explored to add weight influence and structural hints to further enhance the performance. The experimental results verify the effectiveness and superiority of the proposed model in skeleton action recognition; the model reaches state‐of‐the‐art on specific datasets.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords