Applied Sciences (Dec 2021)
A Data Augmentation Method for Skeleton-Based Action Recognition with Relative Features
Abstract
In recent years, skeleton-based human action recognition (HAR) approaches using convolutional neural network (CNN) models have made tremendous progress in computer vision applications. However, using relative features to depict human actions, in addition to preventing overfitting when the CNN model is trained on a few samples, is still a challenge. In this paper, a new motion image is introduced to transform spatial-temporal motion information into image-based representations. For each skeleton sequence, three relative features are extracted to describe human actions. The three relative features are consisted of relative coordinates, immediate displacement, and immediate motion orientation. In particular, the relative coordinates introduced in our paper not only depict the spatial relations of human skeleton joints but also provide long-term temporal information. To address the problem of small sample sizes, a data augmentation strategy consisting of three simple but effective data augmentation methods is proposed to expand the training samples. Because the generated color images are small in size, a shallow CNN model is suitable to extract the deep features of the generated motion images. Two small-scale but challenging skeleton datasets were used to evaluate the method, scoring 96.59% and 97.48% on the Florence 3D Actions dataset and UTkinect-Action 3D dataset, respectively. The results show that the proposed method achieved a competitive performance compared with the state-of-the-art methods. Furthermore, the augmentation strategy proposed in this paper effectively solves the overfitting problem and can be widely adopted in skeleton-based action recognition.
Keywords