2D human skeleton action recognition with spatial constraints

Lei Wang; Jianwei Zhang; Wenbing Yang; Song Gu; Shanmin Yang

doi:10.1049/cvi2.12296

IET Computer Vision (Oct 2024)

2D human skeleton action recognition with spatial constraints

Lei Wang,
Jianwei Zhang,
Wenbing Yang,
Song Gu,
Shanmin Yang

Affiliations

Lei Wang: School of Aeronautics and Astronautics Sichuan University Chengdu Sichuan China
Jianwei Zhang: College of Computer Science Sichuan University Chengdu Sichuan China
Wenbing Yang: Chengdu Army Equipment Department The 3rd Office of Military Delegate Chengdu Sichuan China
Song Gu: School of Aeronautical Manufacturing Industry Chengdu Aeronautic Polytechnic Chengdu Sichuan China
Shanmin Yang: School of Computer Science Chengdu University of Information Technology Chengdu Sichuan China

DOI: https://doi.org/10.1049/cvi2.12296
Journal volume & issue: Vol. 18, no. 7
pp. 968 – 981

Abstract

Read online

Abstract Human actions are predominantly presented in 2D format in video surveillance scenarios, which hinders the accurate determination of action details not apparent in 2D data. Depth estimation can aid human action recognition tasks, enhancing accuracy with neural networks. However, reliance on images for depth estimation requires extensive computational resources and cannot utilise the connectivity between human body structures. Besides, the depth information may not accurately reflect actual depth ranges, necessitating improved reliability. Therefore, a 2D human skeleton action recognition method with spatial constraints (2D‐SCHAR) is introduced. 2D‐SCHAR employs graph convolution networks to process graph‐structured human action skeleton data comprising three parts: depth estimation, spatial transformation, and action recognition. The initial two components, which infer 3D information from 2D human skeleton actions and generate spatial transformation parameters to correct abnormal deviations in action data, support the latter in the model to enhance the accuracy of action recognition. The model is designed in an end‐to‐end, multitasking manner, allowing parameter sharing among these three components to boost performance. The experimental results validate the model's effectiveness and superiority in human skeleton action recognition.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords