IEEE Access (Jan 2025)
AMT-Net: Adversarial Motion Transfer Network With Disentangled Shape and Pose for Realistic Image Animation
Abstract
Computer vision advancements allow motion transfer for animating static objects in images. However, current methods rely on manually collected motion labels and struggle with accurate shape and pose representation, particularly for human bodies, due to occlusions and background variations. Thus, we propose an Adversarial Motion Transfer Network with a disentangled Shape and Pose representation for realistic image Animation (AMT-Net), utilizing an encoder-decoder adversarial structure. Specifically, we design a pose and shape learning module that captures the independent shape and pose information by training a discriminator with adversarial loss techniques, enhancing the generation of coherent animated frames. Furthermore, a motion estimation module is introduced to generate masks for objects in consecutive frames and identify occluded parts by creating occlusion maps from these masks and dense motion vectors. To evaluate the effectiveness of our approach, we conducted extensive experiments using four publicly available datasets, including VoxCeleb, TaiChiHD, TED-Talks, and MGif. The results emphasize the importance of landmark detection for video annotation and smooth transitions, while the independent shape and pose module helps capture precise representations.
Keywords