Transactions of the International Society for Music Information Retrieval (Sep 2018)

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification

  • Matthias Dorfer,
  • Jan Hajič jr.,
  • Andreas Arzt,
  • Harald Frostel,
  • Gerhard Widmer

DOI
https://doi.org/10.5334/tismir.12
Journal volume & issue
Vol. 1, no. 1
pp. 22 – 33

Abstract

Read online

This work addresses the problem of matching musical audio directly to sheet music, without any higher-level abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio–sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet music of high complexity (e.g., nearly the complete solo piano works by Frederic Chopin) and commercial recordings by famous concert pianists. Our results suggest that the proposed method, in combination with the large-scale dataset, yields retrieval models that successfully generalize to data way beyond the synthetic training data used for model building.

Keywords