IEEE Access (Jan 2020)
Light and Fast Hand Pose Estimation From Spatial-Decomposed Latent Heatmap
Abstract
We present a light and efficient approach named Latent Fusion network for fast and accurate hand pose estimation from a single depth image. Our method innovatively decomposes 3D joint regression into 2D plane localization and 1D axis estimation from different spatial perspectives. We design multiple latent heatmap regression branches to predict hand pose separately and a fusion network to output the final result. Experiments on three public hand pose datasets (ICVL, NYU, MSRA) demonstrate that our system achieves state-of-the-art accuracy. Moreover, our method outperforms all top-ranked approaches by a large margin both in terms of inference speed (nearly a thousand frames per second) and model size (less than 10 MB).
Keywords