Multimodal Deep Learning for Music Genre Classification

Sergio Oramas; Francesco Barbieri; Oriol Nieto; Xavier Serra

doi:10.5334/tismir.10

Transactions of the International Society for Music Information Retrieval (Sep 2018)

Multimodal Deep Learning for Music Genre Classification

Sergio Oramas,
Francesco Barbieri,
Oriol Nieto,
Xavier Serra

Affiliations

Sergio Oramas: Music Technology Group, Universitat Pompeu Fabra, Barcelona, ES; Pandora Media Inc., 94612 Oakland
Francesco Barbieri: TALN Group, Universitat Pompeu Fabra, Barcelona
Oriol Nieto: Pandora Media Inc., 94612 Oakland
Xavier Serra: Music Technology Group, Universitat Pompeu Fabra, Barcelona

DOI: https://doi.org/10.5334/tismir.10
Journal volume & issue: Vol. 1, no. 1
pp. 4 – 21

Abstract

Read online

Music genre labels are useful to organize songs, albums, and artists into broader groups that share similar musical characteristics. In this work, an approach to learn and combine multimodal data representations for music genre classification is proposed. Intermediate representations of deep neural networks are learned from audio tracks, text reviews, and cover art images, and further combined for classification. Experiments on single and multi-label genre classification are then carried out, evaluating the effect of the different learned representations and their combinations. Results on both experiments show how the aggregation of learned representations from different modalities improves the accuracy of the classification, suggesting that different modalities embed complementary information. In addition, the learning of a multimodal feature space increases the performance of pure audio representations, which may be specially relevant when the other modalities are available for training, but not at prediction time. Moreover, a proposed approach for dimensionality reduction of target labels yields major improvements in multi-label classification not only in terms of accuracy, but also in terms of the diversity of the predicted genres, which implies a more fine-grained categorization. Finally, a qualitative analysis of the results sheds some light on the behavior of the different modalities on the classification task.

Published in Transactions of the International Society for Music Information Retrieval

ISSN: 2514-3298 (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Music and books on Music: Music
Website: https://transactions.ismir.net/

About the journal

Abstract

Keywords