IEEE Access (Jan 2020)
A Discriminative Dual-Stream Model With a Novel Sustained Attention Mechanism for Skeleton-Based Human Action Recognition
Abstract
The development of RGB-D sensors that have been widely applied in human motion collection is driving research in skeleton-based human action recognition. In recent works, most models based on the attention mechanism are proposed to assign the same weights to all body joints for spatiotemporal feature modeling. However, they fail to consider the fact that the differential contribution of joint points to the human movement, which is a challenge to obtain the high-level performance skeleton presentations. Therefore, in this article, we propose a novel sustained attention model based on the above fact, which adaptively assigns corresponding weights to all the body joints to extract the key skeleton part in the global input sequence. We design a two-stream network based on RNNs and CNNs and integrate the sustained attention mechanism into each subnetwork, in which both the body joint weights and the input frame weights are learned effectively and thus resulting in superior performance. Next, in the training process, the skeleton is randomly transformed to enhance the robustness of this model and reduce overfitting. A group of ablation studies and visualization analyses are conducted to prove the validity and robustness of the proposed model. Extensive experiments on four benchmark datasets included the challenging interaction datasets demonstrate that our proposed model outperforms recent state-of-the-art works.
Keywords