IEEE Access (Jan 2023)
TY-Net: Transforming YOLO for Hand Gesture Recognition
Abstract
Hand gesture recognition is a rapidly expanding field with diverse applications, and the use of skeleton-based methods is gaining popularity due to their potential for lightweight execution on embedded devices. However, ensuring robustness and accuracy in both gesture classification and temporal localization is critical for any gesture recognition system to be successful. In this paper, we propose a novel skeleton-based approach to online gesture recognition that draws inspiration from the YOLO object detection model. Specifically, we propose a transformer-based architecture for online gesture recognition that directly predicts both gesture classes and gesture boundaries from a sliding window input. The model is trained in an end-to-end manner using a proposed loss function that focuses on samples containing gesture centers and learns to predict boundaries in the temporal domain analogous to object centers and spatial bounding boxes in YOLO object detection. To evaluate the effectiveness of our method, we conduct experiments on two publicly available continuous hand gesture datasets: SHREC’22 and IPN Hand. Our results outperform the results of skeleton-based approaches from the SHREC’22 online gesture recognition contest. Moreover, our approach obtained competitive results compared to the approaches utilizing alternative input modalities on the IPN Hand dataset.
Keywords