Phase-Aware Speech Enhancement With Complex Wiener Filter

Huy Nguyen; Tuan Vu Ho; Masato Akagi; Masashi Unoki

doi:10.1109/ACCESS.2023.3341919

IEEE Access (Jan 2023)

Phase-Aware Speech Enhancement With Complex Wiener Filter

Huy Nguyen,
Tuan Vu Ho,
Masato Akagi,
Masashi Unoki

Affiliations

Huy Nguyen: ORCiD; Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology (JAIST), Ishikawa, Nomi, Japan
Tuan Vu Ho: ORCiD; Media Intelligent Processing Reseach Department, Advanced Artificial Intelligent Innovation Center, Hitachi Ltd, Tokyo, Japan
Masato Akagi: ORCiD; Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology (JAIST), Ishikawa, Nomi, Japan
Masashi Unoki: ORCiD; Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology (JAIST), Ishikawa, Nomi, Japan

DOI: https://doi.org/10.1109/ACCESS.2023.3341919
Journal volume & issue: Vol. 11
pp. 141573 – 141584

Abstract

Read online

In speech enhancement, accurate phase reconstruction can significantly improve speech quality. While phase-aware speech enhancement methods using the complex ideal ratio mask (cIRM) have shown promise, the estimation difficulty of the phase is shared with the real and imaginary parts of the cIRM. The pattern lacking in the imaginary part poses particular difficulties. To address this issue, we proposed a phase-aware speech enhancement method that uses a complex Wiener filter, which delegates the estimation of speech and noise amplitude properties and the phase property to different models, mitigating the issues with the cIRM and improving the effectiveness of neural-network training. Our method uses a speech-variance estimation model with a noise-robust vector-quantized variational autoencoder and a phase corrector that maximizes the scale-invariant signal-to-noise ratio in the time domain. To further improve speech-variance estimation, we propose a loss function that uses a categorical distribution of fundamental frequency (F0) for enhancing the spectral fine structure of estimated speech variance. We evaluated our method on the open dataset released by Valentini et al. to directly compare it with other speech-enhancement methods. Our method achieved a perceptual evaluation of speech quality score of 2.86 and short-time objective intelligibility score of 0.94, better than the state-of-the-art method based on cIRM estimation during the 2020 Deep Noise Challenge. Our comprehensive analysis shows that incorporating the proposed loss function for spectral-fine-structure enhancement improves speech quality, especially when the F0 is low.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords