Ecological Indicators (Jun 2023)

A hierarchical birdsong feature extraction architecture combining static and dynamic modeling

  • Yanan Wang,
  • Aibin Chen,
  • Huaicheng Li,
  • Guoxiong Zhou,
  • Jizheng Yi,
  • Zhiqiang Zhang

Journal volume & issue
Vol. 150
p. 110258

Abstract

Read online

To conserve bird biodiversity and monitor the distribution of species in the region, it is of tremendous necessity to identify birds by their songs and explore the rich ecological information birdsong contains. The audios recorded in the monitoring area generally have complex background noise, the characteristics of the song are not prominent and the biological spectrum information is not comprehensive, which brings some challenges to the identification of birds. This study proposes a hierarchical birdsong feature extraction architecture combining dynamic and static modeling to cope with complex environments as a modeling context. Firstly, six common speech features were extracted for the characteristics of birdsong. The Pearson correlation coefficient is then used to analyze the correlations between birdsong and human speech, examining the correlations between each feature in the presence and absence of environmental noise interference. Combined with the scatter plot matrix analysis, we conclude that Mel Frequency Cepstral Coefficient (MFCC) is more suitable comparing with other features when dealing with birdsong and can superiorly cope with a complex background noise. Secondly, a feature extraction architecture is built, which integrates static and dynamic modeling to fully explore the contextual relationship, to solve the problem of ignoring the internal structure information of the patch and losing some spatial information in the Transformer-type model. Finally, a hierarchical refinement module is designed to help extract more detailed features, as well as to optimize the computational cost of the Transformer-type model that requires many training data and has high complexity. The performance of the model can be detected with 93.67 % accuracy on the self-built birdsong dataset, 95.19 % accuracy on the public birdsong dataset Birdsdata and 97.02 % accuracy on the public environmental dataset UrbanSound8k.

Keywords