IEEE Access (Jan 2022)

Two-Stream Spatial Graphormer Networks for Skeleton-Based Action Recognition

  • Xiaolei Li,
  • Junyou Zhang,
  • Shufeng Wang,
  • Qian Zhou

DOI
https://doi.org/10.1109/ACCESS.2022.3206044
Journal volume & issue
Vol. 10
pp. 100426 – 100437

Abstract

Read online

In skeleton-based human action recognition, Transformer, which models the correlations between joint pairs in global topology, has achieved remarkable results. However, compared to many researches on changing graph topology learning in graph convolution network (GCN), Transformer self-attention ignores the topology of the skeleton graph when capturing the dependencies between joints. To address these problems, we propose a novel two-stream spatial Graphormer network (2s-SGR), which uses self-attention incorporating structural encodings to model joint and bone information, and which consists of two networks, the joint stream spatial Graphormer network (Js-SGR) and the bone stream spatial Graphormer network (Bs-SGR). First, in the Js-SGR, while Transformer models joint correlations in the global topology of the space, the topology of the joints and the edge information of the bones are introduced into the self-attention through custom structural encodings. At the same time, joint motion information is modeled in spatial-temporal blocks. The added information on structure and motion can effectively capture the dependencies of nodes between frames and enhance feature representation. Second, for the second-order information of the skeleton, the Bs-SGR adapts to the structure of the bone by adjusting the custom structural encodings. Finally, the global spatial-temporal features of joints and bones in the skeleton are fused and input into the classification network to obtain action recognition results. Extensive experiments on three large-scale datasets, NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics, demonstrate that the performance of the 2s-SGR proposed in this paper is at the state-of-the-art level and is effectively validated by ablation experiments.

Keywords