Applied Sciences (Jul 2024)

Verse1-Chorus-Verse2 Structure: A Stacked Ensemble Approach for Enhanced Music Emotion Recognition

  • Love Jhoye Moreno Raboy,
  • Attaphongse Taparugssanagorn

DOI
https://doi.org/10.3390/app14135761
Journal volume & issue
Vol. 14, no. 13
p. 5761

Abstract

Read online

In this study, we present a novel approach for music emotion recognition that utilizes a stacked ensemble of models integrating audio and lyric features within a structured song framework. Our methodology employs a sequence of six specialized base models, each designed to capture critical features from distinct song segments: verse1, chorus, and verse2. These models are integrated into a meta-learner, resulting in superior predictive performance, achieving an accuracy of 96.25%. A basic stacked ensemble model was also used in this study to independently run the audio and lyric features for each song segment. The six-input stacked ensemble model surpasses the capabilities of models analyzing song parts in isolation. The pronounced enhancement underscores the importance of a bimodal approach in capturing the full spectrum of musical emotions. Furthermore, our research not only opens new avenues for studying musical emotions but also provides a foundational framework for future investigations into the complex emotional aspects of music.

Keywords