Alexandria Engineering Journal (Dec 2024)

Enhancing human pose estimation in sports training: Integrating spatiotemporal transformer for improved accuracy and real-time performance

  • Xinyao Xi,
  • Chen Zhang,
  • Wen Jia,
  • Ruxue Jiang

Journal volume & issue
Vol. 109
pp. 144 – 156

Abstract

Read online

Human pose estimation in sports training is a critical application within Internet of Things (IoT) environments, leveraging IoT devices to enhance performance analysis and injury prevention. Current methods struggle with real-time processing and accuracy in dynamic settings, especially with high-speed movements and diverse data. To address these challenges, we propose a novel dual-channel architecture combining Spatiotemporal Transformer and Temporal Convolutional Network (TCN), integrated into an IoT system. Our model collects real-time motion data through IoT devices, including videos, depth information, and sensor data, combining global spatiotemporal features with local temporal dependencies to enhance pose understanding and estimation accuracy. The Spatiotemporal Transformer uses multi-head self-attention to process global features, while the TCN captures local temporal dependencies across frames. A residual fusion mechanism integrates these features for comprehensive pose estimation. Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that our model significantly outperforms existing methods, achieving Mean Per Joint Position Error (MPJPE) scores of 42.2 mm and 29.1 mm on Human3.6M. This research advances 3D human pose estimation and offers a practical tool for sports training through precise, efficient pose analysis, leveraging deep learning and IoT technologies to enhance athletic performance and prevent injuries.

Keywords