IEEE Access (Jan 2021)
Stereo Feature Learning Based on Attention and Geometry for Absolute Hand Pose Estimation in Egocentric Stereo Views
Abstract
Egocentric hand pose estimation is significant for wearable cameras since the hand interactions are captured from an egocentric viewpoint. Several studies on hand pose estimation have recently been presented based on RGBD or RGB sensors. Although these methods provide accurate hand pose estimation, they have several limitations. For example, RGB-based techniques have intrinsic difficulty in converting relative 3D poses into absolute 3D poses, and RGBD-based techniques only work in indoor environments. Recently, stereo-sensor-based techniques have gained increasing attention owing to their potential to overcome these limitations. However, to the best of our knowledge, there are few techniques and no real datasets available for egocentric stereo vision. In this paper, we propose a top-down pipeline for estimating absolute 3D hand poses using stereo sensors, as well as a novel dataset for training. Our top-down pipeline consists of two steps: hand detection and hand pose estimation. Hand detection detects hand areas and then is followed by hand pose estimation, which estimates the positions of the hand joints. In particular, for hand pose estimation with a stereo camera, we propose an attention-based architecture called StereoNet, a geometry-based loss function called StereoLoss, and a novel 2D disparity map called StereoDMap for effective stereo feature learning. To collect the dataset, we proposed a novel annotation method that helps reduce human annotation efforts. Our dataset is publicly available at https://github.com/seo0914/SEH. We conducted comprehensive experiments to demonstrate the effectiveness of our approach compared with the state-of-the-art methods.
Keywords