IEEE Access (Jan 2024)

DMFNet: A Novel Self-Supervised Dynamic Multi-Focusing Network for Speech Denoising

  • Chenghao Yang,
  • Yi Tao,
  • Jingyin Liu,
  • Xiaomei Xu

DOI
https://doi.org/10.1109/ACCESS.2024.3429398
Journal volume & issue
Vol. 12
pp. 98225 – 98238

Abstract

Read online

In recent years, speech denoising has greatly benefited from the rapid development of neural networks. However, these models require substantial noisy-clean speech pairs for supervised training, which limits their widespread use. Although there have been attempts to train denoising networks with only noisy speech data, existing self-supervised methods often suffer from a lack of continuity, low noise reduction performance, or heavy dependence on noise modeling. In this work, we introduce an efficient self-supervised Dynamic Multi-Focusing Network (DMFNet), a noise-only trained speech denoising network that utilizes a multi-scale connected encoder-decoder architecture as its backbone. Specifically, we have designed an efficient Spectral Dynamic Focusing Unit (SDFU) that enables the network to dynamically adapt the shape of its convolutional kernels while learning features, thus effectively focusing on the spectral structure of the human voice. Additionally, we introduce a Complex Attention Module (CAM), designed with a cross-space structure specialized for feature interaction and extraction. Finally, to further enhance the recovery of fine spectral details, we propose the Complex Multi-Scale Feature Fusion Unit (CMFFU) and Complex Scope Fusion Unit (CSFU) to adaptively fuse the features from different stages in the encoding process. Extensive evaluations across multiple datasets demonstrate that the proposed DMFNet significantly outperforms other state-of-the-art methods.

Keywords