Transactions of the International Society for Music Information Retrieval (Nov 2021)
On Evaluation of Inter- and Intra-Rater Agreement in Music Recommendation
Abstract
Our work is concerned with the subjective perception of music similarity in the context of music recommendation. We present two user studies to explore inter- and intra-rater agreement in quantification of general similarity between pieces of recommended music. Contrary to previous efforts, our test participants are of more uniform age and share a comparable musical background to lower variation within the participant group. The first study uses carefully curated song material from five distinct genres while the second uses songs from a single genre only, with almost all songs in both studies previously unknown to test participants. Repeating the listening tests with a two week lag shows that intra-rater agreement is higher than inter-rater agreement for both studies. Agreement for the single genre study is lower since genre of songs seems a major factor in judging similarity between songs. Mood of raters at test-time is found to have an influence on intra-rater agreement. We discuss the impacts of our results on evaluation of music recommenders and question the validity of experiments on general music similarity.
Keywords