IEEE Access (Jan 2021)

EM-Based TDOA Estimation of a Speech Source via Gaussian Mixture Models in Noisy and Anechoic Environments

  • Zhihua Lu,
  • Joao P. J. Da Costa,
  • Tai Fei

DOI
https://doi.org/10.1109/ACCESS.2021.3119749
Journal volume & issue
Vol. 9
pp. 142605 – 142615

Abstract

Read online

The propagation delay difference of a speech signal transmitted from the source to microphones, also known as time difference of arrival (TDOA), embodies the information of speech source position. The TDOA estimation plays a vital role in diverse systems such as teleconferencing and far-field speech recognition since the TDOA is a key parameter impacting quality of restored speech signals. This paper is devoted to estimating the TDOA of one speech source on a frame by frame basis in noisy and anechoic environments. First, we propose two variants of Gaussian mixture model to represent the speech signal received by a microphone pair, assuming Gaussianity of the signal and modeling speech sparsity by the speech presence probability (SPP). Second, after estimating the noise parameter in advance and formulating the speech parameters using the maximum likelihood principle, the proposed Gaussian mixture models are reduced to being dependent only on two unknowns, i.e. TDOA and SPP. Third, following these two models, we present two distinct estimators to estimate the TDOA and the SPP iteratively based on the expectation maximization algorithm. The proposed two estimators are free from the ad hoc parameter selection which is required in many classical approaches. Simulation results show that the TDOA estimated by them could be more accurate than that of the state-of-the-art GCC variants in a wide range of frames with specific SPP values. More importantly, the automatically estimated SPP which can be served as voice activity detection in a soft manner encodes the information of the TDOA estimation accuracy. In a speech frame, the estimated SPP with a large value indicates the estimated TDOA with small error. For example, when the SPP is larger than 0.76 and 0.87 in the two proposed estimators, respectively, the TDOA estimation error could be at most 19% of that in the worst case.

Keywords