Empirical Musicology Review (Jul 2016)

The Yale-Classical Archives Corpus

  • Christopher William White,
  • Ian Quinn

DOI
https://doi.org/10.18061/emr.v11i1.4958
Journal volume & issue
Vol. 11, no. 1
pp. 50 – 58

Abstract

Read online

The Yale-Classical Archives Corpus (YCAC) contains harmonic and rhythmic information for a dataset of Western European Classical art music. This corpus is based on data from classicalarchives.com, a repository of thousands of user-generated MIDI representations of pieces from several periods of Western European music history. The YCAC makes available metadata for each MIDI file, as well as a list of pitch simultaneities ("salami slices") in the MIDI file. Metadata include the piece's composer, the composer's country of origin, date of composition, genre (e.g., symphony, piano sonata, nocturne, etc.), instrumentation, meter, and key. The processing step groups the file's pitches into vertical slices each time a pitch is added or subtracted from the texture, recording the slice's offset (measured in the number of quarter notes separating the event from the file's beginning), highest pitch, lowest pitch, prime form, scale-degrees in relation to the global key (as determined by experts), and local key information (as determined by a windowed key-profile analysis). The corpus contains 13,769 MIDI files by 571 composers yielding over 14,051,144 vertical slices. This paper outlines several properties of this corpus, along with a representative study using this dataset.

Keywords