Transactions of the International Society for Music Information Retrieval (Dec 2021)
We are Not Groupies… We are Band Aids’: Assessment Reliability in the AI Song Contest
Abstract
In 2020, inspired by the expectation that Rotterdam would host the Eurovision Song Contest, the Dutch public broadcaster VPRO sponsored an international AI Song Contest. The winner was determined by combining an online public vote, which attracted 3800 voters across 70 countries, with the ratings of three professional judges. In this paper, we analyse the voters’ and judges’ ratings to assess the reliability of the contest results and to make recommendations for evaluating the contest in the future. We focus on Rasch-type models because of their strong measurement characteristics, but also consider a mixture variant to inflate counts for the 46 percent of voters who exhibited ‘groupie’-like behaviour: voting for one team only and giving their team a perfect score. We find that the overall reliability of the AI Song Contest evaluation was excellent (ρ = .90) but that the large number of one-time voters distorted the results. These findings pose a dilemma for organising such a contest in the future: to what extent is a popularity contest desirable and even expected from a broader voting public, and to what extent should such a contest strive for an objective measurement of the quality of AI-composed music?
Keywords