Modelling a Spatial-Motion Deep Learning Framework to Classify Dynamic Patterns of Videos

Sandeli  Priyanwada Kasthuri Arachchi; Timothy  K. Shih; Noorkholis  Luthfil Hakim

doi:10.3390/app10041479

Applied Sciences (Feb 2020)

Modelling a Spatial-Motion Deep Learning Framework to Classify Dynamic Patterns of Videos

Sandeli Priyanwada Kasthuri Arachchi,
Timothy K. Shih,
Noorkholis Luthfil Hakim

Affiliations

Sandeli Priyanwada Kasthuri Arachchi: Department of Computer Science and Information Engineering, National Central University, Taoyuan 32001, Taiwan
Timothy K. Shih: Department of Computer Science and Information Engineering, National Central University, Taoyuan 32001, Taiwan
Noorkholis Luthfil Hakim: Department of Computer Science and Information Engineering, National Central University, Taoyuan 32001, Taiwan

DOI: https://doi.org/10.3390/app10041479
Journal volume & issue: Vol. 10, no. 4
p. 1479

Abstract

Read online

Video classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. Traditional hand-crafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination conditions. Prior studies of video classifications focused on the relationship between the standalone streams themselves. In this paper, by leveraging the effects of deep learning methodologies, we propose a two-stream neural network concept, named state-exchanging long short-term memory (SE-LSTM). With the model of spatial motion state-exchanging, the SE-LSTM can classify dynamic patterns of videos using appearance and motion features. The SE-LSTM extends the general purpose of LSTM by exchanging the information with previous cell states of both appearance and motion stream. We propose a novel two-stream model Dual-CNNSELSTM utilizing the SE-LSTM concept combined with a Convolutional Neural Network, and use various video datasets to validate the proposed architecture. The experimental results demonstrate that the performance of the proposed two-stream Dual-CNNSELSTM architecture significantly outperforms other datasets, achieving accuracies of 81.62%, 79.87%, and 69.86% with hand gestures, fireworks displays, and HMDB51 datasets, respectively. Furthermore, the overall results signify that the proposed model is most suited to static background dynamic patterns classifications.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords