IEEE Access (Jan 2024)

Deep Semantic Feature Extraction to Overcome Overlapping Frequencies for Instrument Recognition in Indonesian Traditional Music Orchestras

  • Dewi Nurdiyah,
  • Eko Mulyanto Yuniarno,
  • Sari Ayu Wulandari,
  • Yoyon Kusnendar Surapto,
  • Mauridhi Hery Purnomo

DOI
https://doi.org/10.1109/ACCESS.2024.3401699
Journal volume & issue
Vol. 12
pp. 76936 – 76954

Abstract

Read online

In Indonesian traditional music, specifically Gamelan, overlapping fundamental frequencies occur among different instruments due to certain tones being tuned in the same octaves. This issue is challenging when the instruments are played simultaneously in the musical orchestras, resulting in mixed frequencies. This study utilizes Gamelan music dataset to address this issue by extracting deep semantic features to capture the distinctive characteristics of each instrument in the orchestras. We propose the fusion of Multi-Task Learning Autoencoder (MTL-AE) with Affine Transformation (AFT) to extract deep semantic features by investigating the optimal input derived from the Log Mel Spectrogram and Mel Frequency Cepstral Coefficient (MFCC). MTL-AE simultaneously extracts deep semantic features from eight instruments in the orchestras. AFT preserves these features according to the instrument class. The optimal extraction method was investigated by comparing the proposed method with baseline methods from MHU-Net and MHU-Net enhanced with Feature-wise Linear Modulation (FiLM). Subsequently, arranging deep semantic features from all instruments aims to obtain the structured feature patterns of eight instrument sources in the orchestras. Machine learning classifiers utilize structured deep semantic features for instrument recognition in the orchestras. Performance comparisons were executed against features derived from vanilla Log Mel Spectrogram, MFCC, Principal Component Analysis (PCA), Modified ResNet-50, MobileNet V3, and YAMNET. The results show that the deep semantic features, extracted using the proposed method with input from MFCC, contribute to the structured deep semantic feature to achieve superior accuracy up to 99%. Hence, these features effectively overcome the issue of overlapping frequencies in musical orchestras.

Keywords