IEEE Access (Jan 2020)
3D Human Pose Estimation With Generative Adversarial Networks
Abstract
3D human pose estimation from a monocular RGB image is a challenging task in computer vision because of depth ambiguity in a single RGB image. As most methods consider joint locations independently which can lead to an overfitting problem on specific datasets, it's crucial to consider the plausibility of 3D poses in terms of their overall structures. In this paper, we present Generative Adversarial Networks (GANs) for 3D human pose estimation, which learn plausible 3D human body representations by adversarial training. In GANs, the generator regresses 3D joint positions from a 2D input and the discriminator aims to distinguish the ground-truth 3D samples from the predicted ones. We leverage Graph Convolutional Networks (GCNs) in both generator and discriminator to fully exploit the spatial relations of input and output coordinates. The combination of GANs and GCNs promotes the network to predict more accurate 3D joint locations and learn more reasonable human body structures at the same time. We demonstrate the effectiveness of our approach on standard benchmark datasets (i.e. Human3.6M and HumanEva-I) where it outperforms state-of-the-art methods. Furthermore, we propose a new evaluation metric distance-based Pose Structure Score (dPSS) for evaluating the structural similarity degree between the predicted 3D pose and its ground-truth.
Keywords