UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model

Song Lin; Wenjun Hou

doi:10.1109/ACCESS.2024.3518300

IEEE Access (Jan 2024)

UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model

Song Lin,
Wenjun Hou

Affiliations

Song Lin: ORCiD; School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications, Beijing, China
Wenjun Hou: Beijing Key Laboratory of Network Systems and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2024.3518300
Journal volume & issue: Vol. 12
pp. 196984 – 196999

Abstract

Read online

Diffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generation, and often lack versatility and editing capabilities. To overcome these limitations, we propose UniMotion-DM, a unified framework for both text-motion generation and editing based on diffusion models. UniMotion-DM integrates three core components: 1) a Contrastive Text-Motion Variational Autoencoder (CTMV), which aligns text and motion in a shared latent space using contrastive learning; 2) a controllable diffusion model tailored to the CTMV representation for generating and editing multimodal content; and 3) a Multimodal Conditional Representation and Editing (MCRE) module that leverages CLIP embeddings to enable precise and flexible control across various tasks. The ability of UniMotion-DM to seamlessly handle text-to-motion generation, motion captioning, motion completion, and multimodal editing results in significant improvements in both quantitative and qualitative evaluations. Beyond conventional domains such as gaming and virtual reality, we emphasize UniMotion-DM’s potential in underexplored fields such as healthcare and creative industries. For example, UniMotion-DM could be used to generate personalized physical therapy routines or assist designers in rapidly prototyping motion-based narratives. By addressing these emerging applications, UniMotion-DM paves the way for utilizing multimodal generative models in interdisciplinary and socially impactful areas.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords