Learning to recognise 3D human action from a new skeleton‐based representation using deep convolutional neural networks

Huy‐Hieu Pham; Louahdi Khoudour; Alain Crouzil; Pablo Zegers; Sergio A. Velastin

doi:10.1049/iet-cvi.2018.5014

IET Computer Vision (Apr 2019)

Learning to recognise 3D human action from a new skeleton‐based representation using deep convolutional neural networks

Huy‐Hieu Pham,
Louahdi Khoudour,
Alain Crouzil,
Pablo Zegers,
Sergio A. Velastin

Affiliations

Huy‐Hieu Pham: Cerema, Equipe‐projet STI1 Avenue du Colonel Roche31400ToulouseFrance
Louahdi Khoudour: Cerema, Equipe‐projet STI1 Avenue du Colonel Roche31400ToulouseFrance
Alain Crouzil: Institut de Recherche en Informatique de Toulouse (IRIT), Université de Toulouse, UPS31062ToulouseFrance
Pablo Zegers: AparnixLa Gioconda 4355, 10BLas CondesSantiagoChile
Sergio A. Velastin: Department of Computer Science, Applied Artificial Intelligence Research GroupUniversity Carlos III de Madrid28270MadridSpain

DOI: https://doi.org/10.1049/iet-cvi.2018.5014
Journal volume & issue: Vol. 13, no. 3
pp. 319 – 328

Abstract

Read online

Recognising human actions in untrimmed videos is an important challenging task. An effective three‐dimensional (3D) motion representation and a powerful learning model are two key factors influencing recognition performance. In this study, the authors introduce a new skeleton‐based representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a colour encoding process. By normalising the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the colour‐coded representation is able to represent spatio‐temporal evolutions of complex 3D motions, independently of the length of each sequence. They then design and train different deep convolutional neural networks based on the residual network architecture on the obtained image‐based representations to learn 3D motion features and classify them into classes. Their proposed method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU‐RGB+D, a very large‐scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state‐of‐the‐art approaches while requiring less computation for training and prediction.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords