IEEE Access (Jan 2020)

Optimizing Features Quality: A Normalized Covariance Fusion Framework for Skeleton Action Recognition

  • Guan Huang,
  • Qiuyan Yan

DOI
https://doi.org/10.1109/ACCESS.2020.3037238
Journal volume & issue
Vol. 8
pp. 211869 – 211881

Abstract

Read online

Action recognition based on 3D skeleton sequences has gained considerable attention in recent years. Due to effectively representing the spatial and the temporal characters of skeleton sequences, the Covariance Matrix (CM) features combined with the Long Short-Term Memory (LSTM) network is an effective and reasonable roadmap to enhance the action recognition accuracy. However, the CM features in the existing recognition models are computed from the raw data without normalization or with static normalization. Moreover, a CM feature is calculated from all coordinates in one frame, treating all coordinates in three axes identically and neglecting the relationship of the coordinates in the same axe. In this paper, an end to end deep learning framework is proposed that includes a normalization layer dynamically adapting to data distribution and inference procedure. After normalization, the three covariance feature sequences from the coordinates in three axes are produced from the sliding windows and are fused into one fusion matrix using a convolution layer. Finally, the fusion matrix is sequentially fed into an LSTM network to recognize skeleton action. The novelty of the proposed framework is combining the adaptive preprocessing and the features fusion to the LSTM network and improving the recognition accuracy by optimizing the quality of the features rather than network construction. In the experiments, the proposed framework is verified on the public datasets and one student action dataset collected from a real classroom. The experimental results demonstrate that the proposed method achieves a significant improvement in accuracy compared to the state-of-the-art methods. It can be concluded that the proposed framework can not only accurately capture the correlation of joints in the same frame but can also effectively express the dependences of sequential frames.

Keywords