IEEE Access (Jan 2023)

Blind Source Separation Based on Improved Wave-U-Net Network

  • Chaofeng Lan,
  • Jingjuan Jiang,
  • Lei Zhang,
  • Zhen Zeng

DOI
https://doi.org/10.1109/ACCESS.2023.3330160
Journal volume & issue
Vol. 11
pp. 125951 – 125958

Abstract

Read online

With the development and widespread application of voice interaction technology, it has become crucial to improve the accuracy of blind source separation technology. In order to further enhance the separation results of vocal and accompaniment, this paper proposes an improved Wave-U-Net model. Based on the skip connection of the Wave-U-Net model, we propose a segmented attention module (SAM) consisting of a spatial attention module (SPAM) and a channel attention module (CAM) to replace the skip connections in this model to solve the semantic gap caused by feature concatenation. Furthermore, we replace the 1D convolution layer of the bottleneck layer in this model with an atrous spatial pyramid pooling (ASPP) module. The purpose is to increase the receptive field and obtain multi-scale features at the same time, thereby improving the speech separation performance of the model. We conduct experimental tests in the Musdb18 dataset, and analyze the performance of the model using the SDR, SIR and SAR evaluation indicators. The research results denote that compared with the Wave-U-Net network that only uses feature concatenation, the SDR values of the restored vocal and restored accompaniment are increased by 4.229dB and 4.626dB, respectively, and the separation performance is better than some existing baseline models.

Keywords