Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement

Tian Lan; Wenzheng Ye; Yilan Lyu; Junyi Zhang; Qiao Liu

doi:10.1109/ACCESS.2020.2995346

IEEE Access (Jan 2020)

Embedding Encoder-Decoder With Attention Mechanism for Monaural Speech Enhancement

Tian Lan,
Wenzheng Ye,
Yilan Lyu,
Junyi Zhang,
Qiao Liu

Affiliations

Tian Lan: ORCiD; School of Information and Software Engineering University of Electronic Science and Technology of China, Chengdu, China
Wenzheng Ye: ORCiD; School of Information and Software Engineering University of Electronic Science and Technology of China, Chengdu, China
Yilan Lyu: ORCiD; School of Information and Software Engineering University of Electronic Science and Technology of China, Chengdu, China
Junyi Zhang: ORCiD; Electromagnetic Spectrum Cognition and Management Key Laboratory of Hebei Province, Shijiazhuang, China
Qiao Liu: ORCiD; School of Information and Software Engineering University of Electronic Science and Technology of China, Chengdu, China

DOI: https://doi.org/10.1109/ACCESS.2020.2995346
Journal volume & issue: Vol. 8
pp. 96677 – 96685

Abstract

Read online

The auditory selection framework with attention and memory (ASAM), which has an attention mechanism, embedding generator, generated embedding array, and life-long memory, is used to deal with mixed speech. When ASAM is applied to speech enhancement, the discrepancy between the voice and noise feature memories is huge and the separability of noise and voice is increased. However, ASAM cannot achieve desirable performance in terms of speech enhancement because it fails to utilize the time-frequency dependence of the embedding vectors to generate a corresponding mask unit. This work proposes a novel embedding encoder-decoder (EED), and a convolutional neural network (CNN) is used as decoder. The CNN structure is good at detecting local patterns, which can be exploited to extract correlation embedding data from the embedding array to generate the target spectrogram. This work evaluates a similar ASAM, EED with an LSTM encoder and a CNN decoder (RC-EED), RC-EED with an attention mechanism (RC-AEED), other similar EED structures and baseline models. Experiment results show that RC-EED and RC-AEED networks have good performance on speech enhancement task at low signal-to-noise ratio conditions. In addition, RC-AEED exhibits superior speech enhancement performance over ASAM and achieves better speech quality than do deep recurrent network and convolutional recurrent network.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords