Emerging Science Journal (Feb 2021)

Human Action Recognition in Videos using Convolution Long Short-Term Memory Network with Spatio-Temporal Networks

  • Ashok Sarabu,
  • Ajit Kumar Santra

DOI
https://doi.org/10.28991/esj-2021-01254
Journal volume & issue
Vol. 5, no. 1
pp. 25 – 33

Abstract

Read online

Two-stream convolutional networks plays an essential role as a powerful feature extractor in human action recognition in videos. Recent studies have shown the importance of two-stream Convolutional Neural Networks (CNN) to recognize human action recognition. Recurrent Neural Networks (RNN) has achieved the best performance in video activity recognition combining CNN. Encouraged by CNN's results with RNN, we present a two-stream network with two CNNs and Convolution Long-Short Term Memory (CLSTM). First, we extricate Spatio-temporal features using two CNNs using pre-trained ImageNet models. Second, the results of two CNNs from step one are combined and fed as input to the CLSTM to get the overall classification score. We also explored the various fusion function performance that combines two CNNs and the effects of feature mapping at different layers. And, conclude the best fusion function along with layer number. To avoid the problem of overfitting, we adopt the data augmentation techniques. Our proposed model demonstrates a substantial improvement compared to the current two-stream methods on the benchmark datasets with 70.4% on HMDB-51 and 95.4% on UCF-101 using the pre-trained ImageNet model. Doi: 10.28991/esj-2021-01254 Full Text: PDF

Keywords