Dynamic Fashion Video Synthesis from Static Imagery

Tasin Islam; Alina Miron; Xiaohui Liu; Yongmin Li

doi:10.3390/fi16080287

Future Internet (Aug 2024)

Dynamic Fashion Video Synthesis from Static Imagery

Tasin Islam,
Alina Miron,
Xiaohui Liu,
Yongmin Li

Affiliations

Tasin Islam: Department of Computer Science, Brunel University London, London UB8 3PH, UK
Alina Miron: Department of Computer Science, Brunel University London, London UB8 3PH, UK
Xiaohui Liu: Department of Computer Science, Brunel University London, London UB8 3PH, UK
Yongmin Li: Department of Computer Science, Brunel University London, London UB8 3PH, UK

DOI: https://doi.org/10.3390/fi16080287
Journal volume & issue: Vol. 16, no. 8
p. 287

Abstract

Read online

Online shopping for clothing has become increasingly popular among many people. However, this trend comes with its own set of challenges. For example, it can be difficult for customers to make informed purchase decisions without trying on the clothes to see how they move and flow. We address this issue by introducing a new image-to-video generator called FashionFlow to generate fashion videos to show how clothing products move and flow on a person. By utilising a latent diffusion model and various other components, we are able to synthesise a high-fidelity video conditioned by a fashion image. The components include the use of pseudo-3D convolution, VAE, CLIP, frame interpolator and attention to generate a smooth video efficiently while preserving vital characteristics from the conditioning image. The contribution of our work is the creation of a model that can synthesise videos from images. We show how we use a pre-trained VAE decoder to process the latent space and generate a video. We demonstrate the effectiveness of our local and global conditioners, which help preserve the maximum amount of detail from the conditioning image. Our model is unique because it produces spontaneous and believable motion using only one image, while other diffusion models are either text-to-video or image-to-video using pre-recorded pose sequences. Overall, our research demonstrates a successful synthesis of fashion videos featuring models posing from various angles, showcasing the movement of the garment. Our findings hold great promise for improving and enhancing the online fashion industry’s shopping experience.

Published in Future Internet

ISSN: 1999-5903 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/futureinternet/

About the journal

Abstract

Keywords