Towards Single 2D Image-Level Self-Supervision for 3D Human Pose and Shape Estimation

Junuk Cha; Muhammad Saqlain; Changhwa Lee; Seongyeong Lee; Seungeun Lee; Donguk Kim; Won-Hee Park; Seungryul Baek

doi:10.3390/app11209724

Applied Sciences (Oct 2021)

Towards Single 2D Image-Level Self-Supervision for 3D Human Pose and Shape Estimation

Junuk Cha,
Muhammad Saqlain,
Changhwa Lee,
Seongyeong Lee,
Seungeun Lee,
Donguk Kim,
Won-Hee Park,
Seungryul Baek

Affiliations

Junuk Cha: AI Graduate School, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
Muhammad Saqlain: AI Graduate School, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
Changhwa Lee: Department of Computer Science and Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
Seongyeong Lee: Department of Computer Science and Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
Seungeun Lee: Department of Computer Science and Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
Donguk Kim: AI Graduate School, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
Won-Hee Park: Railway Safety Research Division, Korea Railroad Research Institute, Uiwang-si 16105, Korea
Seungryul Baek: AI Graduate School, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea

DOI: https://doi.org/10.3390/app11209724
Journal volume & issue: Vol. 11, no. 20
p. 9724

Abstract

Read online

Three-dimensional human pose and shape estimation is an important problem in the computer vision community, with numerous applications such as augmented reality, virtual reality, human computer interaction, and so on. However, training accurate 3D human pose and shape estimators based on deep learning approaches requires a large number of images and corresponding 3D ground-truth pose pairs, which are costly to collect. To relieve this constraint, various types of weakly or self-supervised pose estimation approaches have been proposed. Nevertheless, these methods still involve supervision signals, which require effort to collect, such as unpaired large-scale 3D ground truth data, a small subset of 3D labeled data, video priors, and so on. Often, they require installing equipment such as a calibrated multi-camera system to acquire strong multi-view priors. In this paper, we propose a self-supervised learning framework for 3D human pose and shape estimation that does not require other forms of supervision signals while using only single 2D images. Our framework inputs single 2D images, estimates human 3D meshes in the intermediate layers, and is trained to solve four types of self-supervision tasks (i.e., three image manipulation tasks and one neural rendering task) whose ground-truths are all based on the single 2D images themselves. Through experiments, we demonstrate the effectiveness of our approach on 3D human pose benchmark datasets (i.e., Human3.6M, 3DPW, and LSP), where we present the new state-of-the-art among weakly/self-supervised methods.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords