Multi‐stream 3D CNN structure for human action recognition trained by limited data

Vahid Ashkani Chenarlogh; Farbod Razzazi

doi:10.1049/iet-cvi.2018.5088

IET Computer Vision (Apr 2019)

Multi‐stream 3D CNN structure for human action recognition trained by limited data

Vahid Ashkani Chenarlogh,
Farbod Razzazi

Affiliations

Vahid Ashkani Chenarlogh: Department of Electrical and Computer EngineeringIslamic Azad University, Science and Research BranchTehranIran
Farbod Razzazi: Department of Electrical and Computer EngineeringIslamic Azad University, Science and Research BranchTehranIran

DOI: https://doi.org/10.1049/iet-cvi.2018.5088
Journal volume & issue: Vol. 13, no. 3
pp. 338 – 344

Abstract

Read online

Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three‐dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single‐stream, two‐stream, and four‐stream 3D CNNs. In the single‐stream model, the authors applied four channels of information from each frame to a single stream. In the two‐stream architecture, the authors applied optical flow‐x and optical flow‐y into one stream and gradient‐x and gradient‐y to another stream. In the four‐stream architecture, the authors applied each one of the information channels to four separate streams. Evaluating the architectures in an action recognition system, the system was assessed on IXMAS, a data set which has been recorded simultaneously by five cameras. The authors showed that the results of four‐stream architecture were better than other architectures, achieving 87.5, 91.66, 91.11, 88.05, and 81.94% recognition rates for cameras 0–4, respectively, using four‐stream structure (88.05% recognition rate in average).

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords