Speech Enhancement With Phase Sensitive Mask Estimation Using a Novel Hybrid Neural Network

Mojtaba Hasannezhad; Zhiheng Ouyang; Wei-Ping Zhu; Benoit Champagne

doi:10.1109/OJSP.2021.3067147

IEEE Open Journal of Signal Processing (Jan 2021)

Speech Enhancement With Phase Sensitive Mask Estimation Using a Novel Hybrid Neural Network

Mojtaba Hasannezhad,
Zhiheng Ouyang,
Wei-Ping Zhu,
Benoit Champagne

Affiliations

Mojtaba Hasannezhad: ORCiD; Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Zhiheng Ouyang: Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Wei-Ping Zhu: ORCiD; Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Benoit Champagne: ORCiD; Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada

DOI: https://doi.org/10.1109/OJSP.2021.3067147
Journal volume & issue: Vol. 2
pp. 136 – 150

Abstract

Read online

A natural choice to model strong temporal dynamics of speech is the recurrent neural network (RNN) since it can exploit the sequential information from consecutive acoustic frames and generalizes the model well to unseen speakers. Besides, the convolutional neural network (CNN) can automatically extract sophisticated speech features that can maximize the performance of a model. In this paper, we propose a hybrid neural network model integrating a new low-complexity fully-convolutional CNN and a long short-term memory (LSTM) network, a variation of RNN, to estimate a phase-sensitive mask for speech enhancement. The model is designed to take full advantages of the temporal dependencies and spectral correlations present in the input speech signal while keeping the model complexity low. Also, an attention technique is embedded to recalibrate the useful CNN-extracted features adaptively. Furthermore, a grouping strategy is employed to reduce the LSTM complexity while keeping the performance almost unchanged. Through extensive comparative experiments, we show that the proposed model significantly outperforms some known neural network-based speech enhancement methods in the presence of highly non-stationary noises, while it exhibits a relatively small number of model parameters compared to some commonly employed DNN-based methods.

Published in IEEE Open Journal of Signal Processing

ISSN: 2644-1322 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782710

About the journal

Abstract

Keywords