IEEE Access (Jan 2019)

Investigation of Different CNN-Based Models for Improved Bird Sound Classification

  • Jie Xie,
  • Kai Hu,
  • Mingying Zhu,
  • Jinghu Yu,
  • Qibing Zhu

DOI
https://doi.org/10.1109/ACCESS.2019.2957572
Journal volume & issue
Vol. 7
pp. 175353 – 175361

Abstract

Read online

Automatic bird sound classification plays an important role in monitoring and further protecting biodiversity. Recent advances in acoustic sensor networks and deep learning techniques provide a novel way for continuously monitoring birds. Previous studies have proposed various deep learning based classification frameworks for recognizing and classifying birds. In this study, we compare different classification models and selectively fuse them to further improve bird sound classification performance. Specifically, we not only use the same deep learning architecture with different inputs but also employ two different deep learning architectures for constructing the fused model. Three types of time-frequency representations (TFRs) of bird sounds are investigated aiming to characterize different acoustic components of birds: Mel-spectrogram, harmonic-component based spectrogram, and percussive-component based spectrogram. In addition to different TFRs, a different deep learning architecture, SubSpectralNet, is employed to classify bird sounds. Experimental results on classifying 43 bird species show that fusing selected deep learning models can effectively increase the classification performance. Our best fused model can achieve a balanced accuracy of 86.31% and a weighted F1-score of 93.31%.

Keywords