Applied Sciences (Sep 2021)

Capturing Discriminative Information Using a Deep Architecture in Acoustic Scene Classification

  • Hye-jin Shim,
  • Jee-weon Jung,
  • Ju-ho Kim,
  • Ha-jin Yu

DOI
https://doi.org/10.3390/app11188361
Journal volume & issue
Vol. 11, no. 18
p. 8361

Abstract

Read online

Acoustic scene classification contains frequently misclassified pairs of classes that share many common acoustic properties. Specific details can provide vital clues for distinguishing such pairs of classes. However, these details are generally not noticeable and are hard to generalize for different data distributions. In this study, we investigate various methods for capturing discriminative information and simultaneously improve the generalization ability. We adopt a max feature map method that replaces conventional non-linear activation functions in deep neural networks; therefore, we apply an element-wise comparison between the different filters of a convolution layer’s output. Two data augmentation methods and two deep architecture modules are further explored to reduce overfitting and sustain the system’s discriminative power. Various experiments are conducted using the “detection and classification of acoustic scenes and events 2020 task1-a” dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the proposed system demonstrates an accuracy of 70.4% compared to the baseline at 65.1%.

Keywords