IEEE Access (Jan 2018)

The Robustness of Echoic Log-Surprise Auditory Saliency Detection

  • Antonio Rodriguez-Hidalgo,
  • Carmen Pelaez-Moreno,
  • Ascension Gallardo-Antolin

DOI
https://doi.org/10.1109/ACCESS.2018.2882055
Journal volume & issue
Vol. 6
pp. 72083 – 72093

Abstract

Read online

The concept of saliency describes how relevant a stimulus is for humans. This phenomenon has been studied under different perspectives and modalities, such as audio, visual, or both. It has been employed in intelligent systems to interact with their environment in an attempt to emulate or even outperform human behavior in tasks, such as surveillance and alarm systems or even robotics. In this paper, we focus on the aural modality and our goal consists in measuring the robustness of Echoic log-surprise in comparison with a set of auditory saliency techniques when tested on noisy environments for the task of saliency detection. The acoustic saliency methods that we have analyzed include Kalinli’s saliency model, Bayesian log-surprise, and our proposed algorithm, Echoic log-surprise. This last method combines an unsupervised approach based on the Bayesian log-surprise and the biological concept of echoic or auditory sensory memory by means of a statistical fusion scheme, where the use of different distance metrics or statistical divergences, such as Renyi’s or Jensen-Shannon’s among others, are considered. Additionally, for comparison purposes, we have also compared some classical onset detection techniques, such as those based on voice activity detection or energy thresholding. Results show that Echoic log-surprise outperforms the detection capabilities of the rest of the techniques analyzed in this paper under a great variety of noises and signal-to-noise ratios, corroborating its robustness in noisy environments. In particular, our algorithm with the Jensen-Shannon fusion scheme produces the best F-scores. With the aim of better understanding the behavior of Echoic log-surprise, we have also studied the influence of its control parameters, depth and memory, and their influence at different noise levels.

Keywords