IEEE Access (Jan 2022)

Residual Learning for Marine Mammal Classification

  • Daniel T. Murphy,
  • Elias Ioup,
  • Md Tamjidul Hoque,
  • Mahdi Abdelguerfi

DOI
https://doi.org/10.1109/ACCESS.2022.3220735
Journal volume & issue
Vol. 10
pp. 118409 – 118418

Abstract

Read online

The passive acoustic monitoring of marine mammals is an essential tool for researchers tracking the populations of individual species in threatened environments. Given the large quantity of audio data generated by passive acoustic arrays, it is desirable to automate the process of identifying marine mammals present in the recordings. Utilizing acoustic data from the William A. Watkins Marine Mammal Sounds Database, we present an approach using residual learning networks (ResNets) for classifying the marine mammal vocalizations of up to 32 species. We first determine the optimal methods for converting acoustic recordings into discrete spectrograms suitable for input into neural networks. A series of configurations for spectrographic window functions, preprocessing augmentations, and multi-channel spectrogram generation are examined. Each configuration’s spectrographic output is used to train a residual learning network. Its multi-class classification performance is ranked using the harmonic mean of precision and recall to calculate a weighted F1-score. Configurations specifying $512 \times 256$ spectrograms created with a Hann window of 1024 and utilizing horizontal roll demonstrate superior performance. We use the top-performing configurations to generate training data as input for a series of single and multi-channel residual neural networks. These networks are trained to high precision before evaluating their multi-class classification performance. A single-channel network performed the best, obtaining an F1-score of 0.867 with an AUC of 0.9281 on a 32-class classification task. Our multi-channel configuration obtained an F1-score of 0.846 with an AUC of 0.9169. While we demonstrate that networks may learn more information from multi-channel spectrographic inputs, we find that single-channel spectrograms offer superior classification performance overall.

Keywords