Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition

Jun Tang; Baodi Liu; Wenhui Guo; Yanjiang Wang

doi:10.1007/s40747-022-00914-3

Complex & Intelligent Systems (Nov 2022)

Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition

Jun Tang,
Baodi Liu,
Wenhui Guo,
Yanjiang Wang

Affiliations

Jun Tang: College of Control Science and Engineering, China University of Petroleum (East China)
Baodi Liu: College of Control Science and Engineering, China University of Petroleum (East China)
Wenhui Guo: College of Control Science and Engineering, China University of Petroleum (East China)
Yanjiang Wang: College of Control Science and Engineering, China University of Petroleum (East China)

DOI: https://doi.org/10.1007/s40747-022-00914-3
Journal volume & issue: Vol. 9, no. 3
pp. 3147 – 3159

Abstract

Read online

Abstract The key to skeleton-based action recognition is how to extract discriminative features from skeleton data. Recently, graph convolutional networks (GCNs) are proven to be highly successful for skeleton-based action recognition. However, existing GCN-based methods focus on extracting robust features while neglecting the information of feature distributions. In this work, we aim to introduce Fisher vector (FV) encoding into GCN to effectively utilize the information of feature distributions. However, since the Gaussian Mixture Model (GMM) is employed to fit the global distribution of features, Fisher vector encoding inevitably leads to losing temporal information of actions, which is demonstrated by our analysis. To tackle this problem, we propose a temporal enhanced Fisher vector encoding algorithm (TEFV) to provide more discriminative visual representation. Compared with FV, our TEFV model can not only preserve the temporal information of the entire action but also capture fine-grained spatial configurations and temporal dynamics. Moreover, we propose a two-stream framework (2sTEFV-GCN) by combining the TEFV model with the GCN model to further improve the performance. On two large-scale datasets for skeleton-based action recognition, NTU-RGB+D 60 and NTU-RGB+D 120, our model achieves state-of-the-art performance.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords