IEEE Access (Jan 2024)
EdgePose: Real-Time Human Pose Estimation Scheme for Industrial Scenes
Abstract
Common human pose estimation methods rely on 2D heatmap regression, which requires expensive upsampling layers to maintain the resolution of the heatmap and additional post-processing for coordinate decoding. These components hinder the inference speed of human pose estimation tasks. To address this challenge, we propose a new real-time human pose estimation framework, EdgePose. First, we design the convolutional module EdgeBlock-C and the edge attention module EdgeBlock-T, and then build a hybrid network based on them to take advantage of both ConvNet and VIT. In addition, EdgePose simplifies the human pose estimation process by converting the output of the key point coordinates into a pixel classification task along the horizontal and vertical axes, thereby eliminating the upsampling and post-processing operations that may hinder inference speed, and speeding up the model’s inference speed while ensuring accuracy. The experimental results show that EdgePose-S achieved an AP score of 68.6 in the COCO validation test, and at the same time achieved an inference speed of 285.7 FPS on an Intel i9-10920X CPU. In the embedded Jetson Xavier NX environment, EdgePose-B achieved an AP score of 72.2 and an inference speed of 51.3 FPS, which is better than existing two-stage pose estimation methods.
Keywords