IEEE Access (Jan 2025)
Skeleton-Based Data Augmentation for Sign Language Recognition Using Adversarial Learning
Abstract
In recent years, visual-based sign language recognition (SLR) has become an active research area with the advancement of deep learning. However, it is difficult to collect sign language data, and many datasets suffer from data lack and imbalance, leading to overfitting and reduced accuracy in machine learning. In general, data augmentation is used as a solution to the problems but model training and data augmentation are performed independently, and it is difficult to adjust the parameters for data augmentation. Therefore, we focus on visual-based SLR using skeletal data and propose an adversarial learning SLR model called Adversarial Vulnerability-Seeking Networks (AVSN), which jointly trains two independent processes, data augmentation, and machine learning. The AVSN is particularly applicable in scenarios where diverse and extensive sign language datasets are not available. For example, when developing SLR systems for lesser-known sign languages or specialized vocabularies used in specific professional contexts, AVSN can improve model performance by generating high-quality, diverse training data. The generator produces hard adversarial data intended to mislead the machine learning model, which acts as a discriminator. On the other hand, the discriminator learns from both raw and adversarial data. In other words, the generator exposes the vulnerabilities of the discriminator, and the discriminator improves its performance by learning from both types of data. We evaluated the proposed method from two aspects, the performance of the model trained by data augmentation and the quality of the data generated by data augmentation. First, we evaluated the performance of the model trained using data augmentation using common evaluation metrics such as accuracy and F-score. The proposed method achieved an improvement of 0.58% in accuracy and 0.017 in F-score compared to the model without data augmentation. Next, we quantitatively evaluated the quality of the augmented data by data augmentation using evaluation metrics such as FID, variety, multimodality, affinity, and diversity. The proposed method was successful in enhancing the novelty of the data, which contributed to the generalization improvement of the model.
Keywords