Applied Sciences (Aug 2023)

Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement

  • Bismark Kweku Asiedu Asante,
  • Clifford Broni-Bediako,
  • Hiroki Imamura

DOI
https://doi.org/10.3390/app13169217
Journal volume & issue
Vol. 13, no. 16
p. 9217

Abstract

Read online

Multi-stage or multi-generator generative adversarial networks (GANs) have recently been demonstrated to be effective for speech enhancement. The existing multi-generator GANs for speech enhancement only use convolutional layers for synthesising clean speech signals. This reliance on convolution operation may result in masking the temporal dependencies within the signal sequence. This study explores self-attention to address the temporal dependency issue in multi-generator speech enhancement GANs to improve their enhancement performance. We empirically study the effect of integrating a self-attention mechanism into the convolutional layers of the multiple generators in multi-stage or multi-generator speech enhancement GANs, specifically, the ISEGAN and the DSEGAN networks. The experimental results show that introducing a self-attention mechanism into ISEGAN and DSEGAN leads to improvements in their speech enhancement quality and intelligibility across the objective evaluation metrics. Furthermore, we observe that adding self-attention to the ISEGAN’s generators does not only improves its enhancement performance but also bridges the performance gap between the ISEGAN and the DSEGAN with a smaller model footprint. Overall, our findings highlight the potential of self-attention in improving the enhancement performance of multi-generator speech enhancement GANs.

Keywords