IEEE Access (Jan 2024)

Toward Intuitive 3D Interactions in Virtual Reality: A Deep Learning- Based Dual-Hand Gesture Recognition Approach

  • Trudi Di Qi,
  • Franceli L. Cibrian,
  • Meghna Raswan,
  • Tyler Kay,
  • Hector M. Camarillo-Abad,
  • Yuxin Wen

DOI
https://doi.org/10.1109/ACCESS.2024.3400295
Journal volume & issue
Vol. 12
pp. 67438 – 67452

Abstract

Read online

Dual-hand gesture recognition is crucial for intuitive 3D interactions in virtual reality (VR), allowing the user to interact with virtual objects naturally through gestures using both handheld controllers. While deep learning and sensor-based technology have proven effective in recognizing single-hand gestures for 3D interactions, research on dual-hand gesture recognition for VR interactions is still underexplored. In this work, we introduce CWT-CNN-TCN, a novel deep learning model that combines a 2D Convolution Neural Network (CNN) with Continuous Wavelet Transformation (CWT) and a Temporal Convolution Network (TCN). This model can simultaneously extract features from the time-frequency domain and capture long-term dependencies using 3D position and orientation data from handheld controllers for gesture classification. To evaluate the performance of the proposed model, we designed 13 dual-hand gestures representing fundamental 3D interaction tasks: translation, rotation, scaling, and selection, and then collected data from 26 participants using a VR system. The model’s performance was rigorously tested under various hand-tracking scenarios, including dual-hand versus single-hand inputs and complete versus partial motion features. Benchmarking against four state-of-the-art neural networks revealed that CWT-CNN-TCN reliably detects dual-hand gestures with limited tracking data and outperforms the benchmarks. This result paves the way for a dual-hand gesture-based interface that enriches intuitive 3D interactions in VR.

Keywords