Entropy (Apr 2023)

CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement

  • Haozhe Chen,
  • Xiaojuan Zhang

DOI
https://doi.org/10.3390/e25040628
Journal volume & issue
Vol. 25, no. 4
p. 628

Abstract

Read online

In recent years, neural networks based on attention mechanisms have seen increasingly use in speech recognition, separation, and enhancement, as well as other fields. In particular, the convolution-augmented transformer has performed well, as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) was proposed. Compared with traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this CGA-MGAN: MetricGAN based on Convolution-augmented Gated Attention for Speech Enhancement, we propose a network for speech enhancement called CGA-MGAN, a kind of MetricGAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time by fusing convolution and gated attention units. Experiments on Voice Bank + DEMAND show that our proposed CGA-MGAN model achieves excellent performance (3.47 PESQ, 0.96 STOI, and 11.09 dB SSNR) with a relatively small model size (1.14 M).

Keywords