EURASIP Journal on Audio, Speech, and Music Processing (Nov 2021)

A multichannel learning-based approach for sound source separation in reverberant environments

  • You-Siang Chen,
  • Zi-Jie Lin,
  • Mingsian R. Bai

DOI
https://doi.org/10.1186/s13636-021-00227-2
Journal volume & issue
Vol. 2021, no. 1
pp. 1 – 12

Abstract

Read online

Abstract In this paper, a multichannel learning-based network is proposed for sound source separation in reverberant field. The network can be divided into two parts according to the training strategies. In the first stage, time-dilated convolutional blocks are trained to estimate the array weights for beamforming the multichannel microphone signals. Next, the output of the network is processed by a weight-and-sum operation that is reformulated to handle real-valued data in the frequency domain. In the second stage, a U-net model is concatenated to the beamforming network to serve as a non-linear mapping filter for joint separation and dereverberation. The scale invariant mean square error (SI-MSE) that is a frequency-domain modification from the scale invariant signal-to-noise ratio (SI-SNR) is used as the objective function for training. Furthermore, the combined network is also trained with the speech segments filtered by a great variety of room impulse responses. Simulations are conducted for comprehensive multisource scenarios of various subtending angles of sources and reverberation times. The proposed network is compared with several baseline approaches in terms of objective evaluation matrices. The results have demonstrated the excellent performance of the proposed network in dereverberation and separation, as compared to baseline methods.

Keywords