DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations

Tetsuya Ueda; Tomohiro Nakatani; Rintaro Ikeshita; Shoko Araki; Shoji Makino

doi:10.1186/s13636-024-00373-3

EURASIP Journal on Audio, Speech, and Music Processing (Oct 2024)

DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations

Tetsuya Ueda,
Tomohiro Nakatani,
Rintaro Ikeshita,
Shoko Araki,
Shoji Makino

Affiliations

Tetsuya Ueda: Waseda University
Tomohiro Nakatani: NTT Corporation
Rintaro Ikeshita: NTT Corporation
Shoko Araki: NTT Corporation
Shoji Makino: Waseda University

DOI: https://doi.org/10.1186/s13636-024-00373-3
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 20

Abstract

Read online

Abstract This paper proposes novel methods for extracting a single Speech signal of Interest (SOI) from a multichannel observed signal in underdetermined situations, i.e., when the observed signal contains more speech signals than microphones. It focuses on extracting the SOI using prior knowledge of the SOI’s Direction of Arrival (DOA). Conventional beamformers (BFs) and Blind Source Separation (BSS) with spatial regularization struggle to suppress interference speech signals in such situations. Although Switching Minimum Power Distortionless Response BF (Sw-MPDR) can handle underdetermined situations using a switching mechanism, its estimation accuracy significantly decreases when it relies on a steering vector determined by the SOI’s DOA. Spatially-Regularized Independent Vector Extraction (SRIVE) can robustly enhance the SOI based solely on its DOA using spatial regularization, but its performance degrades in underdetermined situations. This paper extends these conventional methods to overcome their limitations. First, we introduce a time-varying Gaussian (TVG) source model to Sw-MPDR to effectively enhance the SOI based solely on the DOA. Second, we introduce the switching mechanism to SRIVE to improve its speech enhancement performance in underdetermined situations. These two proposed methods are called Switching weighted MPDR (Sw-wMPDR) and Switching SRIVE (Sw-SRIVE). We experimentally demonstrate that both surpass conventional methods in enhancing the SOI using the DOA in underdetermined situations.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords