Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering

Shaoxiong Lin; Wangyou Zhang; Yanmin Qian

doi:10.3390/app13084926

Applied Sciences (Apr 2023)

Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering

Shaoxiong Lin,
Wangyou Zhang,
Yanmin Qian

Affiliations

Shaoxiong Lin: X-LANCE Lab, MoE Key Lab of Artificial Intelligence, AI Institute, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Wangyou Zhang: X-LANCE Lab, MoE Key Lab of Artificial Intelligence, AI Institute, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Yanmin Qian: X-LANCE Lab, MoE Key Lab of Artificial Intelligence, AI Institute, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

DOI: https://doi.org/10.3390/app13084926
Journal volume & issue: Vol. 13, no. 8
p. 4926

Abstract

Read online

Speech enhancement has been extensively studied and applied in the fields of automatic speech recognition (ASR), speaker recognition, etc. With the advances of deep learning, attempts to apply Deep Neural Networks (DNN) to speech enhancement have achieved remarkable results and the quality of enhanced speech has been greatly improved. In this study, we propose a two-stage model for single-channel speech enhancement. The model has two DNNs with the same architecture. In the first stage, only the first DNN is trained. In the second stage, the second DNN is trained to refine the enhanced output from the first DNN, while the first DNN is frozen. A multi-frame filter is introduced to help the second DNN reduce the distortion of the enhanced speech. Experimental results on both synthetic and real datasets show that the proposed model outperforms other enhancement models not only in terms of speech enhancement evaluation metrics and word error rate (WER), but also in its superior generalization ability. The results of the ablation experiments also demonstrate that combining the two-stage model with the multi-frame filter yields better enhancement performance and less distortion.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords