IEEE Access (Jan 2019)
GestureVLAD: Combining Unsupervised Features Representation and Spatio-Temporal Aggregation for Doppler-Radar Gesture Recognition
Abstract
In this paper we propose a novel framework to process Doppler-radar signals for hand gesture recognition. Doppler-radar sensors provide many advantages over other emerging sensing modalities, including low development costs and high sensitivity to capture subtle gestures with precision. Furthermore, they have attractive properties for ubiquitous deployment and can be conveniently embedded into different devices. In this scope, current recognition methods still rely in deep CNN-LSTM and 3D CNN-LSTM structures that require sufficient labelled data to optimize millions of parameters and significant amount of computational resources for inference; which limits their deployment. Indeed, subtle gestures recognition is a challenging task due to the high variability of gestures among different subjects. To overcome the challenges in the recognition task and the limitations of the current methods, we propose a shallow learning approach for gesture recognition, that is based on unsupervised range-Doppler features representation, along with a learnable pooling aggregation via NetVLAD. The proposed framework can encode extremely valuable information across time, and results in features that are highly discriminative for hand gesture recognition. Experimentation on publicly available Doppler-radar data shows that the proposed framework outperforms state-of-the-art approaches in terms of recognition accuracy and speed for sequence-level hand gesture classification.
Keywords