An efficient human action recognition framework with pose-based spatiotemporal features

Saeid Agahian; Farhood Negin; Cemal Köse

Engineering Science and Technology, an International Journal (Feb 2020)

An efficient human action recognition framework with pose-based spatiotemporal features

Saeid Agahian,
Farhood Negin,
Cemal Köse

Affiliations

Saeid Agahian: Department of Computer Engineering, Faculty of Engineering, Erzurum Technical University, 25050 Erzurum, Turkey; Corresponding author at: Department of Computer Engineering, Faculty of Engineering, Erzurum Technical University, 25050 Erzurum, Turkey.
Farhood Negin: Institut Pascal, CNRS, UMR 6602, F-63171 Aubiere, France
Cemal Köse: Department of Computer Engineering, Faculty of Engineering, Karadeniz Technical University, 61080 Trabzon, Turkey

Journal volume & issue: Vol. 23, no. 1
pp. 196 – 203

Abstract

Read online

In the past two decades, human action recognition has been among the most challenging tasks in the field of computer vision. Recently, extracting accurate and cost-efficient skeleton information became available thanks to the cutting edge deep learning algorithms and low-cost depth sensors. In this paper, we propose a novel framework to recognize human actions using 3D skeleton information. The main components of the framework are pose representation and encoding. Assuming that human actions can be represented by spatiotemporal poses, we define a pose descriptor consisting of three elements. The first element contains the normalized coordinates of the raw skeleton joints information. The second element contains the temporal displacement information relative to a predefined temporal offset and the third element keeps the displacement information pertinent to the previous timestamp in the temporal resolution. The final descriptor of the whole sequence is the concatenation of frame-wise descriptors. To avoid the problems regarding high dimensionality, Principal Component Analysis (PCA) is applied on the descriptors. The resulted descriptors are encoded with Fisher Vector (FV) representation before they get trained with an Extreme Learning Machine (ELM).The performance of the proposed framework is evaluated by three public benchmark datasets. The proposed method achieved competitive results compared to the other methods in the literature. Keywords: Skeleton-based, 3D action recognition, Extreme learning machines, RGB-D

Published in Engineering Science and Technology, an International Journal

ISSN: 2215-0986 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/engineering-science-and-technology-an-international-journal/

About the journal