Prior-free 3D human pose estimation in a video using limb-vectors
Anam Memon,
Qasim Arain,
Nasrullah Pirzada,
Akram Shaikh,
Adel Sulaiman,
Mana Saleh Al Reshan,
Hani Alshahrani,
Asadullah Shaikh
Affiliations
Anam Memon
Department of Computer Systems Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan
Qasim Arain
Department of Software Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan
Nasrullah Pirzada
Department of Telecommunication Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan
Akram Shaikh
PASTIC National Centre, QAU Campus, Islamabad, Pakistan
Adel Sulaiman
Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia; Emerging Technologies Research Lab (ETRL), College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia
Mana Saleh Al Reshan
Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia; Emerging Technologies Research Lab (ETRL), College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia
Hani Alshahrani
Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia; Emerging Technologies Research Lab (ETRL), College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia
Asadullah Shaikh
Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia; Emerging Technologies Research Lab (ETRL), College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia; Corresponding author at: Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia.
Estimating accurate 3D human poses from a monocular video is fundamental to various computer vision tasks. Existing methods exploit 2D-to-3D pose lifting, multiview images, and depth sensors to model spatio-temporal dependencies. However, depth ambiguities, occlusions, and larger temporal receptive fields pose challenges to these approaches. To address this, we propose a novel prior-free DCNN-based 3D human pose estimation method for monocular image sequences using limb vectors. Our method comprises two subnetworks: a limb direction estimator and a limb length estimator. The limb direction estimator utilizes a fully convolutional network to model limb direction vectors across a temporal window. We show that network complexity can be significantly reduced by utilizing dilated convolutional operations and a relatively smaller receptive field while maintaining estimation accuracy. Moreover, the limb length estimator captures stable limb length estimations from a reliable frame set. Our model has shown superior performance compared to existing methods on the Human3.6M and MPI-INF-3DHP datasets.