Dual‐view 3D human pose estimation without camera parameters for action recognition

Long Liu; Le Yang; Wanjun Chen; Xin Gao

doi:10.1049/ipr2.12277

IET Image Processing (Dec 2021)

Dual‐view 3D human pose estimation without camera parameters for action recognition

Long Liu,
Le Yang,
Wanjun Chen,
Xin Gao

Affiliations

Long Liu: School of Automation and Information Engineering Xi'an University of Technology Xi'an Shaanxi China
Le Yang: School of Automation and Information Engineering Xi'an University of Technology Xi'an Shaanxi China
Wanjun Chen: Department of Information Science Xi'an University of Technology Xi'an Shaanxi China
Xin Gao: School of Automation and Information Engineering Xi'an University of Technology Xi'an Shaanxi China

DOI: https://doi.org/10.1049/ipr2.12277
Journal volume & issue: Vol. 15, no. 14
pp. 3433 – 3440

Abstract

Read online

Abstract The purpose of 3D human pose estimation is to estimate the 3D coordinates of key points of the human body directly from images. Although multi‐view based methods have better performance and higher precision of coordinate estimation than a single‐view based, they need to know the camera parameters. In order to effectively avoid the restriction of this constraint and improve the generalizability of the model, a dual‐view single‐person 3D pose estimation method without camera parameters is proposed. This method first uses the 2D pose estimation network HR‐net to estimate the 2D joint point coordinates from two images with different views, and then inputs them into the 3D regression network to generate the final 3D joint point coordinates. In order to make the 3D regression network fully learn the spatial structure relationship of the human body and the transformation projection relationship between different views, a self‐supervised training method is designed based on a 3D human pose orthogonal projection model to generate the virtual views. In the pose estimation experiments on the Human3.6 dataset, this method achieves a significantly improved estimation error of 34.5 mm. Furthermore, an action recognition based on the human poses extracted by the proposed method is conducted, and an accuracy of 83.19% is obtained.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords