Frontiers in Psychology (Apr 2019)

Phrase-Level Modeling of Expression in Violin Performances

  • Fábio J. M. Ortega,
  • Sergio I. Giraldo,
  • Alfonso Perez,
  • Rafael Ramírez

DOI
https://doi.org/10.3389/fpsyg.2019.00776
Journal volume & issue
Vol. 10

Abstract

Read online

Background: Expression is a key skill in music performance, and one that is difficult to address in music lessons. Computational models that learn from expert performances can help providing suggestions and feedback to students.Aim: We propose and analyze an approach to modeling variations in dynamics and note onset timing for solo violin pieces with the purpose of facilitating expressive performance learning in new pieces, for which no reference performance is available.Method: The method generates phrase–level predictions based on musical score information on the assumption that expressiveness is idiomatic, and thus influenced by similar–sounding melodies. Predictions were evaluated numerically using three different datasets and against note–level machine–learning models, and also perceptually by listeners, who were presented to synthesized versions of musical excerpts, and asked to choose the most human–sounding one. Some of the presented excerpts were synthesized to reflect the variations in dynamics and timing predicted by the model, whereas others were shaped to reflect the dynamics and timing of an actual expert performance, and a third group was presented with no expressive variations.Results: surprisingly, none of the three synthesized versions was consistently selected as human–like nor preferred with statistical significance by listeners. Possible interpretations of these results include the fact that the melodies might have been impossible to interpret outside their musical context, or that expressive features that were left out of the modeling such as note articulation and vibrato are, in fact, essential to the perception of expression in violin performance. Positive feedback by some listeners toward the modeled melodies in a blind setting indicate that the modeling approach was capable of generating appropriate renditions at least for a subset of the data. Numerically, performance in phrase–level suffers a small degradation if compared to note–level, but produces predictions easier to interpret visually, thus more useful in a pedagogical setting.

Keywords