IEEE Access (Jan 2025)

Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition

  • Xing Liu,
  • Bo Gao

DOI
https://doi.org/10.1109/ACCESS.2024.3525185
Journal volume & issue
Vol. 13
pp. 6463 – 6474

Abstract

Read online

Skeleton-based human interaction recognition has gained increasing attention due to its ability to capture complex multi-person dynamics. Significant progress has been made in interaction recognition research, but challenges remain. First, variations in camera positions and viewpoints can cause significant differences in skeletal data for actions of the same type. Second, capturing both spatial information from skeleton structures and temporal information from interaction sequences is crucial for discriminative interaction feature representation. Third, the different contributions of each participant especially in asymmetric interactions are often overlooked. To address the above issues, we propose an innovative method by designing the individual contribution based spatial-temporal attention graph convolutional network. In this work, we first propose a simple but feasible view transformation method to reduce data mismatch from multi-view cameras. Then we design individual contribution weights to measure the importance of each person for interaction feature representation. Next, a novel spatial-temporal attention module based on individual contribution weights is proposed to obtain attention based skeleton data, which are fed to multiple layers of graph convolution to extract spatial-temporal features. Additionally, we use a two-stream architecture with joint coordinates and joint motion data as inputs for each stream. A weighted fusion strategy is utilized to obtain the final classification score. Experiments conducted on three different datasets demonstrate that proposed interaction recognition method can achieve satisfactory results compared with other works.

Keywords