Real-Time Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation and Spatially Regularized Independent Low-Rank Matrix Analysis With Fast Demixing Matrix Estimation

Yuto Ishikawa; Tomohiko Nakamura; Norihiro Takamune; Daichi Kitamura; Hiroshi Saruwatari; Yu Takahashi; Kazunobu Kondo

doi:10.1109/access.2025.3569590

IEEE Access (Jan 2025)

Real-Time Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation and Spatially Regularized Independent Low-Rank Matrix Analysis With Fast Demixing Matrix Estimation

Yuto Ishikawa,
Tomohiko Nakamura,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo

Affiliations

Yuto Ishikawa: ORCiD; Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Tomohiko Nakamura: ORCiD; Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Norihiro Takamune: ORCiD; Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Daichi Kitamura: ORCiD; National Institute of Technology, Kagawa College, Takamatsu, Kagawa, Japan
Hiroshi Saruwatari: ORCiD; Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Yu Takahashi: ORCiD; Yamaha Corporation, Hamamatsu-shi, Shizuoka, Japan
Kazunobu Kondo: ORCiD; Yamaha Corporation, Hamamatsu-shi, Shizuoka, Japan

DOI: https://doi.org/10.1109/access.2025.3569590
Journal volume & issue: Vol. 13
pp. 88683 – 88706

Abstract

Read online

Real-time speech extraction is a valuable task and has diverse applications, such as speech recognition in a human-like avatar/robot and hearing aids. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank matrix analysis (ILRMA) and rank-constrained spatial covariance matrix estimation (RCSCME). It has been reported that, in an offline scenario, the RCSCME-based method (a multichannel blind speech extraction method based on ILRMA and RCSCME) experimentally achieved superior speech extraction performance under diffuse noise conditions. Here, we focus on the facts that the ILRMA output required in RCSCME is only the time-invariant demixing matrix and the entire process of the RCSCME-based method can be divided into two parts: the ILRMA and RCSCME parts. Thus, to perform the RCSCME-based method in real time, we introduce the blockwise batch algorithm into the RCSCME-based method by performing the ILRMA and RCSCME parts in parallel. To improve the real-time speech extraction performance, we introduce a spatial regularization into the ILRMA part and devise two regularizers. For further acceleration and numerical stabilization, we derive new algorithms for vectorwise coordinate descent (VCD) and iterative projection (IP). These algorithms are analytically equivalent to conventional ones. In experiments, we first confirm the effectiveness of the proposed VCD algorithm in terms of both computational time and numerical stability. Next, we show that the proposed real-time framework with the proposed VCD/IP algorithms achieves superior speech extraction performance compared with conventional methods and can function in real time on low computational resources. Finally, we also demonstrate the effectiveness of the designed regularizers in terms of speech extraction performance and the robustness of the proposed methods to errors in the prior information.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords