IEEE Access (Jan 2020)

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

  • Javier Naranjo-Alcazar,
  • Sergi Perez-Castanos,
  • Irene Martin-Morato,
  • Pedro Zuccarello,
  • Francesc J. Ferri,
  • Maximo Cobos

DOI
https://doi.org/10.1109/ACCESS.2020.3031685
Journal volume & issue
Vol. 8
pp. 188875 – 188882

Abstract

Read online

Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their adoption for 1D end-to-end architectures is still scarce in the audio domain. Thus, the suitability of different residual block designs for raw audio classification is partly unknown. The purpose of this article is to compare, analyze and discuss the performance of several residual block implementations, the most commonly used in image classification problems, within a state-of-the-art CNN-based architecture for end-to-end audio classification using raw audio waveforms. Deep and careful statistical analyses over six different residual block alternatives are conducted, considering two well-known datasets and common input normalization choices. The results show that, while some significant differences in performance are observed among architectures using different residual block designs, the selection of the most suitable residual block can be highly dependent on the input data.

Keywords