Transactions of the International Society for Music Information Retrieval (Jan 2024)

Real World Music Object Recognition

  • Lukas Tuggener,
  • Raphael Emberger,
  • Adhiraj Ghosh,
  • Pascal Sager,
  • Yvan Putra Satyawan,
  • Javier Montoya,
  • Simon Goldschagg,
  • Florian Seibold,
  • Urs Gut,
  • Philipp Ackermann,
  • Jürgen Schmidhuber,
  • Thilo Stadelmann

DOI
https://doi.org/10.5334/tismir.157
Journal volume & issue
Vol. 7, no. 1
pp. 1–14 – 1–14

Abstract

Read online

We present solutions to two of the most pressing issues in contemporary optical music recognition (OMR). We improve recognition accuracy on low-quality, real-world (i.e. containing ageing, lighting, or dirt artefacts among others) input data and provide confidence-rated model outputs to enable efficient human post-processing. Specifically, we present (i) a sophisticated input augmentation scheme that can reduce the gap between sanitised benchmarks and realistic tasks through a combination of synthetic data and noisy perturbations of real-world documents; (ii) an adversarial discriminative domain adaptation method that can be employed to improve the performance of OMR systems on low-quality data; (iii) a combination of model ensembles and prediction fusion, which generates trustworthy confidence ratings for each prediction. We evaluate our contributions on a newly created test set consisting of manually annotated pages of varying real-world quality, sourced from the International Music Score Library Project (IMSLP)/Petrucci Music Library. With the presented data augmentation scheme, we achieve a doubling in detection performance from 36.0% to 73.3% on noisy real-world data compared to state-of-the-art training. This result is then combined with robust confidence ratings paving the way for OMR to be deployed in the real world. Additionally, we show the merits of unsupervised adversarial domain adaptation for OMR raising the 36.0% baseline to 48.9%. All our code and data are freely available at: https://github.com/raember/s2anet/tree/TISMIR_publication.

Keywords