EURASIP Journal on Audio, Speech, and Music Processing (Oct 2023)

YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation

  • Le Ma,
  • Xinda Wu,
  • Ruiyuan Tang,
  • Chongjun Zhong,
  • Kejun Zhang

DOI
https://doi.org/10.1186/s13636-023-00306-6
Journal volume & issue
Vol. 2023, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which makes manually selecting music time-consuming and require professional knowledge and it becomes crucial to automatically recommend music for video. For there is no e-commerce advertisements dataset, we first establish a large-scale e-commerce advertisements dataset Commercial-98K, which covers major e-commerce categories. Then, we proposed a video-music retrieval model YuYin to learn the correlation between video and music. We introduce a weighted fusion module (WFM) to fuse emotion features and audio features from music to get a more fine-grained music representation. Considering the similarity of music in the same product category, YuYin is trained by multi-task learning to explore the correlation between video and music by cross-matching video, music, and tag as well as a category prediction task. We conduct extensive experiments to prove YuYin achieves a remarkable improvement in video-music retrieval on Commercial-98K.

Keywords