A time-frequency fusion model for multi-channel speech enhancement

Xiao Zeng; Shiyun Xu; Mingjiang Wang

doi:10.1186/s13636-024-00367-1

EURASIP Journal on Audio, Speech, and Music Processing (Sep 2024)

A time-frequency fusion model for multi-channel speech enhancement

Xiao Zeng,
Shiyun Xu,
Mingjiang Wang

Affiliations

Xiao Zeng: Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology
Shiyun Xu: Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology
Mingjiang Wang: Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology

DOI: https://doi.org/10.1186/s13636-024-00367-1
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Multi-channel speech enhancement plays a critical role in numerous speech-related applications. Several previous works explicitly utilize deep neural networks (DNNs) to exploit tempo-spectral signal characteristics, which often leads to excellent performance. In this work, we present a time-frequency fusion model, namely TFFM, for multi-channel speech enhancement. We utilize three cascaded U-Nets to capture three types of high-resolution features, aiming to investigate their individual contributions. To be specific, the first U-Net keeps the time dimension and performs feature extraction along the frequency dimension for the high-resolution spectral features with global temporal information, the second U-Net keeps the frequency dimension and extracts features along the time dimension for the high-resolution temporal features with global spectral information, and the third U-Net downsamples and upsamples along both the frequency and time dimensions for the high-resolution tempo-spectral features. These three cascaded U-Nets are designed to aggregate local and global features, thereby effectively handling the tempo-spectral information of speech signals. The proposed TFFM in this work outperforms state-of-the-art baselines.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords