Speech Enhancement Based on Time-Frequency Domain GAN

YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi

doi:10.11896/jsjkx.210500114

Jisuanji kexue (Jun 2022)

Speech Enhancement Based on Time-Frequency Domain GAN

YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi

Affiliations

YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi: 1 National Engineering Research Center for Multimedia Software,Wuhan University,Wuhan 430072,China ;2 School of Computer Science,Central China Normal University,Wuhan 430077,China

DOI: https://doi.org/10.11896/jsjkx.210500114
Journal volume & issue: Vol. 49, no. 6
pp. 187 – 192

Abstract

Read online

The traditional speech enhancement algorithm based on generative adversarial networks (SEGAN) enhances speech in the time domain,and completely ignores the distribution of speech samples in frequency domain.Under the condition of low signal-to-noise ratio,the speech signal will be submerged in noise,and the time-domain distribution information of noisy speech is difficult to capture.Therefore,the enhancement performance of SEGAN will drop sharply,and the speech quality and speech intelligibility of its enhanced speech are very low.To solve this problem,this paper proposes a speech enhancement algorithm (time-frequency domain SEGAN,TFSEGAN) based on time-frequency domain generation confrontation network.TFSEGAN adopts the model structure of the time-frequency domain dual discriminator,and a time-frequency L1 loss function.The input of time domain discriminator is time domain feature of the speech sample,and the input of frequency domain discriminator is frequency domain feature of the speech sample.In the training process,time-domain discriminator uses the time-domain distribution information of speech sample as the criterion,and frequency-domain discriminator uses the frequency-domain distribution information of the speech sample as the criterion.Under the action of two discriminators,the generator of TFSEGAN could simulta-neously learn the distribution rules and information of speech samples in time domain and frequency domain.Experiments prove that,compared with SEGAN,the speech quality and intelligibility of TFSEGAN improve by about 17.45% and 11.75% respectively at low signal-to-noise ratio.

speech enhancement|generative adversarial network|time-frequency domain|low signal-to-noise ratio|speech qua-lity|speech intelligibility

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords