IEEE Access (Jan 2025)

Real-Time Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation and Spatially Regularized Independent Low-Rank Matrix Analysis With Fast Demixing Matrix Estimation

  • Yuto Ishikawa,
  • Tomohiko Nakamura,
  • Norihiro Takamune,
  • Daichi Kitamura,
  • Hiroshi Saruwatari,
  • Yu Takahashi,
  • Kazunobu Kondo

DOI
https://doi.org/10.1109/access.2025.3569590
Journal volume & issue
Vol. 13
pp. 88683 – 88706

Abstract

Read online

Real-time speech extraction is a valuable task and has diverse applications, such as speech recognition in a human-like avatar/robot and hearing aids. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank matrix analysis (ILRMA) and rank-constrained spatial covariance matrix estimation (RCSCME). It has been reported that, in an offline scenario, the RCSCME-based method (a multichannel blind speech extraction method based on ILRMA and RCSCME) experimentally achieved superior speech extraction performance under diffuse noise conditions. Here, we focus on the facts that the ILRMA output required in RCSCME is only the time-invariant demixing matrix and the entire process of the RCSCME-based method can be divided into two parts: the ILRMA and RCSCME parts. Thus, to perform the RCSCME-based method in real time, we introduce the blockwise batch algorithm into the RCSCME-based method by performing the ILRMA and RCSCME parts in parallel. To improve the real-time speech extraction performance, we introduce a spatial regularization into the ILRMA part and devise two regularizers. For further acceleration and numerical stabilization, we derive new algorithms for vectorwise coordinate descent (VCD) and iterative projection (IP). These algorithms are analytically equivalent to conventional ones. In experiments, we first confirm the effectiveness of the proposed VCD algorithm in terms of both computational time and numerical stability. Next, we show that the proposed real-time framework with the proposed VCD/IP algorithms achieves superior speech extraction performance compared with conventional methods and can function in real time on low computational resources. Finally, we also demonstrate the effectiveness of the designed regularizers in terms of speech extraction performance and the robustness of the proposed methods to errors in the prior information.

Keywords