Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition

Song Gao; Dingzhuo Zhang; Zhaoming Tang; Hongyan Wang

doi:10.3390/s24237609

Sensors (Nov 2024)

Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition

Song Gao,
Dingzhuo Zhang,
Zhaoming Tang,
Hongyan Wang

Affiliations

Song Gao: Aviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, China
Dingzhuo Zhang: College of Information Engineering, Dalian University, Dalian 116622, China
Zhaoming Tang: Aviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, China
Hongyan Wang: School of Comuputer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

DOI: https://doi.org/10.3390/s24237609
Journal volume & issue: Vol. 24, no. 23
p. 7609

Abstract

Read online

Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords