Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement

Xin Lin; Yang Zhang; Shiyuan Wang

doi:10.1038/s41598-024-68708-w

Scientific Reports (Jul 2024)

Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement

Xin Lin,
Yang Zhang,
Shiyuan Wang

Affiliations

Xin Lin: College of Electronic and Information Engineering, Southwest University
Yang Zhang: College of Electronic and Information Engineering, Southwest University
Shiyuan Wang: College of Electronic and Information Engineering, Southwest University

DOI: https://doi.org/10.1038/s41598-024-68708-w
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Deep learning has made significant advancements in speech enhancement, which plays a crucial role in improving the quality of speech signals in noisy conditions. In this paper, we propose a new approach called M-DGAN, which introduces a time (T)-domain encoder-decoder structure with rich channel representations into the time-frequency (TF)-domain generator framework, resulting in a new generator structure with mixed magnitude and phase representations in the T and TF-domains. The proposed mixed T-domain and TF-domain generator, incorporating the cascaded reworked conformer (CRC) structure, exhibits improved modeling capability and adaptability. Test results on the Voice Bank + DEMAND public dataset show that our method achieves the highest score with $$PSEQ=3.52$$ P S E Q = 3.52 and performs well on all the remaining metrics when compared to the current state-of-the-art methods. In addition, tests on the NISQA_TEST_LIVETALK real dataset of the NISQA Corpus show the breadth and robustness of our model on speech enhancement tasks.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal