Computational Visual Media (Oct 2022)
Imposing temporal consistency on deep monocular body shape and pose estimation
Abstract
Abstract Accurate and temporally consistent modeling of human bodies is essential for a wide range of applications, including character animation, understanding human social behavior, and AR/VR interfaces. Capturing human motion accurately from a monocular image sequence remains challenging; modeling quality is strongly influenced by temporal consistency of the captured body motion. Our work presents an elegant solution to integrating temporal constraints during fitting. This increases both temporal consistency and robustness during optimization. In detail, we derive parameters of a sequence of body models, representing shape and motion of a person. We optimize these parameters over the complete image sequence, fitting a single consistent body shape while imposing temporal consistency on the body motion, assuming body joint trajectories to be linear over short time. Our approach enables the derivation of realistic 3D body models from image sequences, including jaw pose, facial expression, and articulated hands. Our experiments show that our approach accurately estimates body shape and motion, even for challenging movements and poses. Further, we apply it to the particular application of sign language analysis, where accurate and temporally consistent motion modelling is essential, and show that the approach is well-suited to this kind of application.
Keywords