End-to-End 3D Human Pose Estimation Network With Multi-Layer Feature Fusion

Guoci Cai; Changshe Zhang; Jingxiu Xie; Jie Pan; Chaopeng Li; Yiliang Wu

doi:10.1109/ACCESS.2024.3419032

IEEE Access (Jan 2024)

End-to-End 3D Human Pose Estimation Network With Multi-Layer Feature Fusion

Guoci Cai,
Changshe Zhang,
Jingxiu Xie,
Jie Pan,
Chaopeng Li,
Yiliang Wu

Affiliations

Guoci Cai: ORCiD; College of Ocean Information Engineering, Jimei University, Xiamen, China
Changshe Zhang: ORCiD; College of Ocean Information Engineering, Jimei University, Xiamen, China
Jingxiu Xie: ORCiD; College of Ocean Information Engineering, Jimei University, Xiamen, China
Jie Pan: College of Ocean Information Engineering, Jimei University, Xiamen, China
Chaopeng Li: College of Ocean Information Engineering, Jimei University, Xiamen, China
Yiliang Wu: ORCiD; College of Ocean Information Engineering, Jimei University, Xiamen, China

DOI: https://doi.org/10.1109/ACCESS.2024.3419032
Journal volume & issue: Vol. 12
pp. 89124 – 89134

Abstract

Read online

The 3D human pose estimation is a technique used to determine the position of the human body in a three-dimensional space. This involves identifying body rotations, joint angles, and other pose-related information from image or video data. In this paper, we propose an end-to-end 3D human pose estimation network that is based on multi-level feature fusion.The network is composed of two main components. The first component utilizes the deepest features extracted by the backbone network. These features undergo initial data encoding and are then processed by the Semantic Information Extraction Module, which primarily consists of a multi-head self-attention mechanism. This module extracts deeper features, resulting in primary human body feature data. The second component focuses on the shallowest features and inputs them into the Global Information Processing Module, which performs global feature extraction.The features extracted from both components, along with the Bbox info (bounding box information), are collectively fed into the Iterative Regression Module. This module generates human pose data, which is then utilized to reconstruct and generate the human body using a human pose model. To evaluate the performance of our method, we train and test it on well-known benchmark datasets such as 3DPW, AGORA and MPII. Our method demonstrates exceptional performance, as it achieves a reduction of approximately 5.3% on the PA-MPJPE metric and approximately 5.1% on the MPJPE metric compared to the best model we referenced.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords