IEEE Access (Jan 2020)

A Multimodal End-to-End Deep Learning Architecture for Music Popularity Prediction

  • David Martin-Gutierrez,
  • Gustavo Hernandez Penaloza,
  • Alberto Belmonte-Hernandez,
  • Federico Alvarez Garcia

DOI
https://doi.org/10.1109/ACCESS.2020.2976033
Journal volume & issue
Vol. 8
pp. 39361 – 39374

Abstract

Read online

The continuous evolution of multimedia applications is fostering applied research in order to dynamically enhance the services provided by platforms such as Spotify, Lastfm, or Billboard. Thus, innovative methods for retrieving specific information from large volumes of data related with music arises as a potential challenge within the Music Information Retrieval (MIR) framework. Moreover, despite the existence of several musical-based datasets, there is still a lack of information to properly assess an accurate estimation of the impact or the popularity of a song within a platform. Furthermore, the aforementioned platforms measure the popularity in various manners, thus increasing the difficulties in performing generalized and comparable models. In this paper, the creation of SpotGenTrack Popularity Dataset (SPD) is presented as an alternative solution to existing datasets that will facilitate researchers when comparing and promoting their models. In addition, an innovative multimodal end-to-end Deep Learning architecture named as HitMusicNet is presented for predicting popularity in music recordings. Experiments conducted show that the proposed architecture outperforms previous studies in the State-of-the-Art by incorporating three main modalities to the analysis, such as audio, lyrics and meta-data as well as a preliminary compression stage via autoencoder to better the capability of the model when predicting the popularity.

Keywords