IEEE Access (Jan 2021)

Cluster Analysis for the Separation of Auditory Scenes

  • Matthew S. Daley,
  • Lia M. Bonacci,
  • David H. Gever,
  • Krystina Diaz,
  • Jeffrey B. Bolkhovsky

DOI
https://doi.org/10.1109/ACCESS.2021.3113615
Journal volume & issue
Vol. 9
pp. 130959 – 130967

Abstract

Read online

The “cocktail party problem” refers to the ability of human listeners to separate the acoustic signal reaching their ears into its individual components, corresponding to individual sound sources in the environment. Despite this phenomenon appearing trivial for humans, solving the cocktail party problem computationally remains an ambitious challenge. The approach used in this paper takes inspiration from human strategies for separating an acoustic environment into distinct perceptual auditory streams. A series of time-frequency-based features, analogous to those thought to emerge at various stages in the human auditory processing pathway, are derived from biaural auditory inputs. These feature vectors are used as inputs to an unsupervised cluster analysis used to group feature values that are assumed to correspond to the same object. Reconstructed auditory streams are then correlated to the original components used to create the auditory scene. Our model is capable of reconstructing streams that correlate to the original components (r = 0.3-0.7) used to create the complex auditory scene. The success of the reconstructions is largely dependent on the signal-to-noise ratio of the components of the auditory scene.

Keywords