IET Computer Vision (Dec 2017)

Fully convolutional networks for action recognition

  • Sheng Yu,
  • Yun Cheng,
  • Li Xie,
  • Shao‐Zi Li

DOI
https://doi.org/10.1049/iet-cvi.2017.0005
Journal volume & issue
Vol. 11, no. 8
pp. 744 – 749

Abstract

Read online

Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two‐stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial‐temporal features, a linear weighted fusion method is used to fuse two‐stream networks’ feature maps and a video pooling method is adopted to construct the video‐level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state‐of‐the‐art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).

Keywords