EURASIP Journal on Audio, Speech, and Music Processing (Sep 2024)

Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos

  • Chaoyang Zhang,
  • Yan Hua

DOI
https://doi.org/10.1186/s13636-024-00370-6
Journal volume & issue
Vol. 2024, no. 1
pp. 1 – 12

Abstract

Read online

Abstract With the rapid development of social networks, short videos have become a popular form of content, especially dance videos. In this context, research on automatically generating music for dance videos shows significant practical value. However, existing studies face challenges such as limited richness in music timbre and lack of synchronization with dance movements. In this paper, we propose Dance2Music-Diffusion, a novel framework for music generation from dance videos using latent diffusion models. Our approach includes a motion encoder module for extracting motion features and a music diffusion generation module for generating latent music representations. By integrating dance type monitoring and latent diffusion techniques, our framework outperforms existing methods in generating complex and rich dance music. We conducted objective and subjective evaluations of the results produced by various existing models on the AIST++ dataset. Our framework shows outstanding performance in terms of beat recall rate, consistency with GT beats, and coordination with dance movements. This work represents the state of the art in automatic music generation from dance videos, is easy to train, and has implications for enhancing entertainment experiences and inspiring innovative dance productions. Sample videos of our generated music and dance can be viewed at https://youtu.be/eCvLdLdkX-Y . The code is available at https://github.com/hellto/dance2music-diffusion .

Keywords