Transactions of the International Society for Music Information Retrieval (Aug 2024)
The Billboard Melodic Music Dataset (BiMMuDa)
Abstract
We introduce the Billboard Melodic Music Dataset (BiMMuDa), which contains the lead vocal melodies of the top five songs of each year from 1950 to 2022 according to the Billboard year-end singles charts. In this article, the dataset’s compilation process and attributes are described in detail. The melody from each of the 371 songs was transcribed manually in full to create 371 MIDI (musical instrument digital interface) files, and then melodies from the songs’ different sections (e.g., verses, choruses) were exported into separate files to create an additional 1,133 MIDI files of shorter melodies. Lyrics to the songs are provided separately from the melodic transcriptions. This report includes comprehensive descriptions and graphical representations of the available metadata per song and per melody. Analysis of verse and chorus melodies revealed structural differences between them: chorus melodies have significantly fewer notes and lower note density, but larger melodic intervals on average. Whether added to existing datasets or used as a complete dataset, BiMMuDa can serve as ground truth data for a variety of MIR tasks as well as provide insight into the study of Western pop melody.
Keywords