Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming

Mostafa Bella; Hicham Saylani; Shahram Hosseini; Yannick Deville

doi:10.1109/ACCESS.2023.3315596

IEEE Access (Jan 2023)

Multi-Channel Bin-Wise Speech Separation Combining Time-Frequency Masking and Beamforming

Mostafa Bella,
Hicham Saylani,
Shahram Hosseini,
Yannick Deville

Affiliations

Mostafa Bella: ORCiD; IRAP, UPS, CNRS, CNES, Université de Toulouse, Toulouse, France
Hicham Saylani: MatSim, Faculté des Sciences, Université Ibnou Zohr, Agadir, Morocco
Shahram Hosseini: IRAP, UPS, CNRS, CNES, Université de Toulouse, Toulouse, France
Yannick Deville: ORCiD; IRAP, UPS, CNRS, CNES, Université de Toulouse, Toulouse, France

DOI: https://doi.org/10.1109/ACCESS.2023.3315596
Journal volume & issue: Vol. 11
pp. 100632 – 100645

Abstract

Read online

This paper presents a novel Blind Source Separation method that can handle convolutive mixtures that may be underdetermined. Our method combines TF masking and beamforming and exploits the source signals sparsity in the Time-Frequency (TF) domain. Remarkable performance can be achieved by TF masking-based methods, even in the underdetermined case, although they tend to generate unwanted artifacts at the level of the separated signals. Besides, beamforming techniques can achieve satisfactory performance only in the overdetermined and determined cases without distorting the estimated signals. By combining these two approaches, we can leverage their respective strengths. Firstly, we exploit the source signals sparsity in the TF domain to estimate probabilistic “bin-wise” masks by modeling the frequency observation vectors with a complex Gaussian Mixture Model and using an EM algorithm. However, due to the sensitivity of the EM algorithm to initialization, we propose properly selecting the initial values of the model parameters using Hermitian angles between the frequency observation vectors and a reference vector. Then, we utilize the estimated TF masks to estimate the Relative Transfer Functions of each source. Finally, we propose a new technique to obtain an estimate of the spatial images of the separated sources, which can be regarded as an underdetermined extension of the Linearly Constrained Minimum Power beamformer. Good performance was observed in test results for our method, both in the determined and underdetermined cases, compared to various existing methods with similar working hypotheses.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords