IET Computer Vision (Dec 2014)
Human action recognition using weighted pooling
Abstract
Pooling strategies, such as max pooling and sum pooling, have been widely used to obtain the global representations for action videos. However, these pooling strategies have several disadvantages. First, they are easily affected by unwanted background local features, the absence of discriminative local features and the times of actions periodically performed by actors. Second, most pooling strategies only use local features to build the global representation that captures little mid‐level features for action representation. In this study, the authors propose a novel weighted pooling strategy based on actionlets representation for action recognition. The actionlets are defined as the movements of large bodies such as legs, arms and head, which capture rich mid‐level features for action representation. Besides, the authors’ method also incorporates the distribution information of actionlets into pooling procedure. Specifically, a pooling weight, which determines the importance of actionlet on the final video representation, is assigned to each actionlet. To learn the weight, they propose a novel discriminative learning algorithm to capture the discriminative information for pooling operation. They evaluate their weighted pooling on three datasets: KTH actions dataset, UCF sports dataset and Youtube actions dataset. Experimental results show the effectiveness of the proposed method.
Keywords