IEEE Access (Jan 2020)
Action Recognition Using High Temporal Resolution 3D Neural Network Based on Dilated Convolution
Abstract
3D Convolution Neural Networks (CNNs), an important deep learning model, has good performance in recognizing actions in videos. When recognizing actions from videos, 3D CNNs usually down-sample in temporal dimension, leading to loss of the temporal information. To obtain more temporal information from the videos, this work proposed a new model based on the Inflated 3D ConvNet (I3D), named as I3D-T. Instead of using down-sample in temporal dimension, the proposed model applied the dilated convolution in temporal dimension to enlarge the receptive field. At the same time, a non-local feature gating block was designed in the model to learn the correlations between different feature maps. The experimental results showed that the proposed I3D-T has the state-of-art performance. Using RGB frames as input, the action recognition accuracies are respectively 95% and 74.8% in public dataset of UCF101 and HMDB-51.
Keywords