IEEE Access (Jan 2023)
Phase-Aware Speech Enhancement With Complex Wiener Filter
Abstract
In speech enhancement, accurate phase reconstruction can significantly improve speech quality. While phase-aware speech enhancement methods using the complex ideal ratio mask (cIRM) have shown promise, the estimation difficulty of the phase is shared with the real and imaginary parts of the cIRM. The pattern lacking in the imaginary part poses particular difficulties. To address this issue, we proposed a phase-aware speech enhancement method that uses a complex Wiener filter, which delegates the estimation of speech and noise amplitude properties and the phase property to different models, mitigating the issues with the cIRM and improving the effectiveness of neural-network training. Our method uses a speech-variance estimation model with a noise-robust vector-quantized variational autoencoder and a phase corrector that maximizes the scale-invariant signal-to-noise ratio in the time domain. To further improve speech-variance estimation, we propose a loss function that uses a categorical distribution of fundamental frequency (F0) for enhancing the spectral fine structure of estimated speech variance. We evaluated our method on the open dataset released by Valentini et al. to directly compare it with other speech-enhancement methods. Our method achieved a perceptual evaluation of speech quality score of 2.86 and short-time objective intelligibility score of 0.94, better than the state-of-the-art method based on cIRM estimation during the 2020 Deep Noise Challenge. Our comprehensive analysis shows that incorporating the proposed loss function for spectral-fine-structure enhancement improves speech quality, especially when the F0 is low.
Keywords