Stereo Feature Learning Based on Attention and Geometry for Absolute Hand Pose Estimation in Egocentric Stereo Views

Kyeongeun Seo; Hyeonjoong Cho; Daewoong Choi; Taewook Heo

doi:10.1109/ACCESS.2021.3105969

IEEE Access (Jan 2021)

Stereo Feature Learning Based on Attention and Geometry for Absolute Hand Pose Estimation in Egocentric Stereo Views

Kyeongeun Seo,
Hyeonjoong Cho,
Daewoong Choi,
Taewook Heo

Affiliations

Kyeongeun Seo: ORCiD; Department of Computer Convergence Software, Korea University, Sejong-si, South Korea
Hyeonjoong Cho: ORCiD; Department of Computer Convergence Software, Korea University, Sejong-si, South Korea
Daewoong Choi: ORCiD; Department of Computer Convergence Software, Korea University, Sejong-si, South Korea
Taewook Heo: ORCiD; Electronics and Telecommunications Research Institute, Daejeon-si, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3105969
Journal volume & issue: Vol. 9
pp. 116083 – 116093

Abstract

Read online

Egocentric hand pose estimation is significant for wearable cameras since the hand interactions are captured from an egocentric viewpoint. Several studies on hand pose estimation have recently been presented based on RGBD or RGB sensors. Although these methods provide accurate hand pose estimation, they have several limitations. For example, RGB-based techniques have intrinsic difficulty in converting relative 3D poses into absolute 3D poses, and RGBD-based techniques only work in indoor environments. Recently, stereo-sensor-based techniques have gained increasing attention owing to their potential to overcome these limitations. However, to the best of our knowledge, there are few techniques and no real datasets available for egocentric stereo vision. In this paper, we propose a top-down pipeline for estimating absolute 3D hand poses using stereo sensors, as well as a novel dataset for training. Our top-down pipeline consists of two steps: hand detection and hand pose estimation. Hand detection detects hand areas and then is followed by hand pose estimation, which estimates the positions of the hand joints. In particular, for hand pose estimation with a stereo camera, we propose an attention-based architecture called StereoNet, a geometry-based loss function called StereoLoss, and a novel 2D disparity map called StereoDMap for effective stereo feature learning. To collect the dataset, we proposed a novel annotation method that helps reduce human annotation efforts. Our dataset is publicly available at https://github.com/seo0914/SEH. We conducted comprehensive experiments to demonstrate the effectiveness of our approach compared with the state-of-the-art methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords