IEEE Open Journal of Signal Processing (Jan 2021)

Speech Enhancement With Phase Sensitive Mask Estimation Using a Novel Hybrid Neural Network

  • Mojtaba Hasannezhad,
  • Zhiheng Ouyang,
  • Wei-Ping Zhu,
  • Benoit Champagne

DOI
https://doi.org/10.1109/OJSP.2021.3067147
Journal volume & issue
Vol. 2
pp. 136 – 150

Abstract

Read online

A natural choice to model strong temporal dynamics of speech is the recurrent neural network (RNN) since it can exploit the sequential information from consecutive acoustic frames and generalizes the model well to unseen speakers. Besides, the convolutional neural network (CNN) can automatically extract sophisticated speech features that can maximize the performance of a model. In this paper, we propose a hybrid neural network model integrating a new low-complexity fully-convolutional CNN and a long short-term memory (LSTM) network, a variation of RNN, to estimate a phase-sensitive mask for speech enhancement. The model is designed to take full advantages of the temporal dependencies and spectral correlations present in the input speech signal while keeping the model complexity low. Also, an attention technique is embedded to recalibrate the useful CNN-extracted features adaptively. Furthermore, a grouping strategy is employed to reduce the LSTM complexity while keeping the performance almost unchanged. Through extensive comparative experiments, we show that the proposed model significantly outperforms some known neural network-based speech enhancement methods in the presence of highly non-stationary noises, while it exhibits a relatively small number of model parameters compared to some commonly employed DNN-based methods.

Keywords