Empirical Musicology Review (Jan 2024)

An Annotated Corpus of Tonal Piano Music from the Long 19th Century

  • Johannes Hentschel,
  • Yannis Rammos,
  • Fabian C. Moss,
  • Markus Neuwirth,
  • Martin Rohrmeier

DOI
https://doi.org/10.18061/emr.v18i1.8903
Journal volume & issue
Vol. 18, no. 1
pp. 84 – 95

Abstract

Read online

We present a dataset of 264 annotated piano pieces of nine composers, composed in the long 19th century (https://doi.org/10.5281/zenodo.7483349). Annotations adhere to the DCML harmony annotation standard and include Roman numerals, phrase boundaries, and cadence types. The scores are encoded in the XML-based MuseScore 3 format. Annotations are embedded within the MuseScore files. In addition, all harmony information, alongside key features of the encoded measure and note objects, is provided in the form of plaintext TSV-formatted tables for increased interoperability with other datasets and analysis tools. Annotations were collaboratively created and reviewed by a pool of trained music theorists. Collaboration took place asynchronously online via a semi-automated GitHub-based workflow designed for quality assurance, allowing cycles of revisions and reviews until consensus is reached. The full revision history is retained, providing data for further empirical research on inter-annotator agreement and related topics. We also present descriptive statistics about the nine corpora and the dataset as a whole, including comparisons of pitch-class contents, phrase lengths, modulations, and cadence types. We conclude with a discussion of our musicological principles for corpus building and considerations of representability.

Keywords