Applied Sciences (Nov 2024)

Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram

  • Rujia Chen,
  • Akbar Ghobakhlou,
  • Ajit Narayanan

DOI
https://doi.org/10.3390/app142310837
Journal volume & issue
Vol. 14, no. 23
p. 10837

Abstract

Read online

Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability.

Keywords