Jisuanji kexue yu tansuo (Mar 2021)
Context Information Fusion Method for Temporal Action Proposals
Abstract
In the field of human activity localization and recognition in videos, the existing temporal action proposal methods have not solved the long-term dependence problem better, which results in lower recall rates of proposals. In view of this problem, a method based on context information fusion for temporal action proposals is proposed in this paper. Firstly, the spatiotemporal features of video units are extracted by the 3D convolutional network. Then, the bidirectional recurrent network is used to construct the context relationship for predicting the temporal action proposals. Considering the problems of more parameters and the vanishing gradient in the gated recurrent unit (GRU), a simplified-GRU (S-GRU) is proposed, in which the input features control the gating structure to enhance the parallel computing capability and the weighted average is introduced to enhance the ability of the gated recurrent unit to adaptively fuse the history and current time information. Finally, experimental results on the Thumos14 dataset demonstrate that the method based on the bidirectional S-GRU for temporal action proposals improves the recall rate of proposals.
Keywords