IEEE Access (Jan 2020)

Sound Source Localization Based on GCC-PHAT With Diffuseness Mask in Noisy and Reverberant Environments

  • Ran Lee,
  • Min-Seok Kang,
  • Bo-Hyun Kim,
  • Kang-Ho Park,
  • Sung Q Lee,
  • Hyung-Min Park

DOI
https://doi.org/10.1109/ACCESS.2019.2963768
Journal volume & issue
Vol. 8
pp. 7373 – 7382

Abstract

Read online

Although sound source localization is a desirable technique in many communication systems and intelligence applications, the distortion caused by diffuse noise or reverberation makes the time delay estimation (TDE) between signals acquired by a pair of microphones a complicated and challenging problem. In this paper, we describe a method that can efficiently achieve sound source localization in noisy and reverberant environments. This method is based on the generalized cross-correlation (GCC) function with phase transform (PHAT) weights (GCC-PHAT) to achieve robustness against reverberation. In addition, to estimate the time delay robust to diffuse components and to further improve the robustness of the GCC-PHAT against reverberation, time-frequency(t-f) components of observations directly emitted by a point source are chosen by “inversed” diffuseness. The diffuseness that can be estimated from the coherent-to-diffuse power ratio (CDR) based on spatial coherence between two microphones represents the contribution of diffuse components on a scale of zero to one with direct sounds from a source modeled to be fully coherent. In particular, the “inversed” diffuseness is binarized with a very rigorous threshold to select highly reliable components for accurate TDE even in noisy and reverberant environments. Experimental results for both simulated and real-recorded data consistently demonstrated the robustness of the presented method against diffuse noise and reverberation.

Keywords