Creating DALI, a Large Dataset of Synchronized Audio, Lyrics, and Notes

Gabriel Meseguer-Brocal; Alice Cohen-Hadria; Geoffroy Peeters

doi:10.5334/tismir.30

Transactions of the International Society for Music Information Retrieval (Jun 2020)

Creating DALI, a Large Dataset of Synchronized Audio, Lyrics, and Notes

Gabriel Meseguer-Brocal,
Alice Cohen-Hadria,
Geoffroy Peeters

Affiliations

Gabriel Meseguer-Brocal: Ircam Lab, CNRS, Sorbonne Universite, Ministére de la Culture, Paris
Alice Cohen-Hadria: Ircam Lab, CNRS, Sorbonne Universite, Ministére de la Culture, Paris
Geoffroy Peeters: LTCI, Institut Polytechnique de Paris, Paris

DOI: https://doi.org/10.5334/tismir.30
Journal volume & issue: Vol. 3, no. 1

Abstract

Read online

The DALI dataset is a large dataset of time-aligned symbolic vocal melody notations (notes) and lyrics at four levels of granularity. DALI contains 5358 songs in its first version and 7756 for the second one. In this article, we present the dataset, explain the developed tools to work the data and detail the approach used to build it. Our method is motivated by active learning and the teacher-student paradigm. We establish a loop whereby dataset creation and model learning interact, benefiting each other. We progressively improve our model using the collected data. At the same time, we correct and enhance the collected data every time we update the model. This process creates an improved DALI dataset after each iteration. Finally, we outline the errors still present in the dataset and propose solutions to global issues. We believe that DALI can encourage other researchers to explore the interaction between model learning and dataset creation, rather than regarding them as independent tasks.

Published in Transactions of the International Society for Music Information Retrieval

ISSN: 2514-3298 (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Music and books on Music: Music
Website: https://transactions.ismir.net/

About the journal

Abstract

Keywords