DMFNet: A Novel Self-Supervised Dynamic Multi-Focusing Network for Speech Denoising

Chenghao Yang; Yi Tao; Jingyin Liu; Xiaomei Xu

doi:10.1109/ACCESS.2024.3429398

IEEE Access (Jan 2024)

DMFNet: A Novel Self-Supervised Dynamic Multi-Focusing Network for Speech Denoising

Chenghao Yang,
Yi Tao,
Jingyin Liu,
Xiaomei Xu

Affiliations

Chenghao Yang: ORCiD; Department of Marine Technology and Engineering, Xiamen University, Xiamen, China
Yi Tao: ORCiD; Department of Marine Technology and Engineering, Xiamen University, Xiamen, China
Jingyin Liu: ORCiD; College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China
Xiaomei Xu: ORCiD; Department of Marine Technology and Engineering, Xiamen University, Xiamen, China

DOI: https://doi.org/10.1109/ACCESS.2024.3429398
Journal volume & issue: Vol. 12
pp. 98225 – 98238

Abstract

Read online

In recent years, speech denoising has greatly benefited from the rapid development of neural networks. However, these models require substantial noisy-clean speech pairs for supervised training, which limits their widespread use. Although there have been attempts to train denoising networks with only noisy speech data, existing self-supervised methods often suffer from a lack of continuity, low noise reduction performance, or heavy dependence on noise modeling. In this work, we introduce an efficient self-supervised Dynamic Multi-Focusing Network (DMFNet), a noise-only trained speech denoising network that utilizes a multi-scale connected encoder-decoder architecture as its backbone. Specifically, we have designed an efficient Spectral Dynamic Focusing Unit (SDFU) that enables the network to dynamically adapt the shape of its convolutional kernels while learning features, thus effectively focusing on the spectral structure of the human voice. Additionally, we introduce a Complex Attention Module (CAM), designed with a cross-space structure specialized for feature interaction and extraction. Finally, to further enhance the recovery of fine spectral details, we propose the Complex Multi-Scale Feature Fusion Unit (CMFFU) and Complex Scope Fusion Unit (CSFU) to adaptively fuse the features from different stages in the encoding process. Extensive evaluations across multiple datasets demonstrate that the proposed DMFNet significantly outperforms other state-of-the-art methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords