Multi-view 3D human pose estimation based on multi-scale feature by orthogonal projection

Wang Yinghan; Dong Jianmin; Wang Yanan; Sun Bingyang

doi:10.1051/e3sconf/202452201043

E3S Web of Conferences (Jan 2024)

Multi-view 3D human pose estimation based on multi-scale feature by orthogonal projection

Wang Yinghan,
Dong Jianmin,
Wang Yanan,
Sun Bingyang

Affiliations

Wang Yinghan: College of Information Engineering, Xizang Minzu University
Dong Jianmin: College of Information Engineering, Xizang Minzu University
Wang Yanan: College of Information Engineering, Xizang Minzu University
Sun Bingyang: College of Information Engineering, Xizang Minzu University

DOI: https://doi.org/10.1051/e3sconf/202452201043
Journal volume & issue: Vol. 522
p. 01043

Abstract

Read online

Aiming at the problems of inaccurate estimation results, complicated matching of feature information in different views and poor robustness of the network model in complex scenes, a multi-view multi-person 3D human pose estimation model with multi-scale feature orthogonal projection is proposed, which includes a multi-scale orthogonal projection fusion network and an orthogonal feature ascending dimension network. Firstly, the multi-scale orthogonal projection fusion network performs orthogonal projection of features at multiple scales, using the residual structure to fuse features in the same plane separately, simplifying the feature learning difficulty and reducing the feature loss due to projection. Then, it is fed into the orthogonal feature ascending dimension network to reconstruct higher level 3D features using trilinear interpolation and deconvolution to improve the expressiveness of the model, and finally fed to the backbone network to supplement the information of the high-dimensional features, and the network regresses according to the different stages of the task to obtain the 3D human pose. The experimental results show that the Percentage of 3D Correct Parts is improved on the Campus and Shelf datasets, and the Mean Per Joint Position Error is reduced on the CMU Panoptic dataset and the average accuracy is improved at a smaller threshold compared to the previous method. The prediction results are also better than the previous method by reducing the perspective input on the trained model. The proposed method not only effectively estimates the 3D human pose, but also improves the prediction accuracy and enhances the robustness of the network model.

Published in E3S Web of Conferences

ISSN: 2267-1242 (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Geography. Anthropology. Recreation: Environmental sciences
Website: http://www.e3s-conferences.org/

About the journal