Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

Ruwei Li; Xiaoyue Sun; Yanan Liu; Dengcai Yang; Liang Dong

doi:10.1186/s13634-019-0618-4

EURASIP Journal on Advances in Signal Processing (Apr 2019)

Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

Ruwei Li,
Xiaoyue Sun,
Yanan Liu,
Dengcai Yang,
Liang Dong

Affiliations

Ruwei Li: Beijing Key Lab of Computational Intelligence and Intelligent System, Faculty of Information Technology, School of Information and Communications Engineering, Beijing University of Technology
Xiaoyue Sun: Beijing Key Lab of Computational Intelligence and Intelligent System, Faculty of Information Technology, School of Information and Communications Engineering, Beijing University of Technology
Yanan Liu: Beijing Key Lab of Computational Intelligence and Intelligent System, Faculty of Information Technology, School of Information and Communications Engineering, Beijing University of Technology
Dengcai Yang: Beijing Key Lab of Computational Intelligence and Intelligent System, Faculty of Information Technology, School of Information and Communications Engineering, Beijing University of Technology
Liang Dong: Electrical and Computer Engineering, Baylor University

DOI: https://doi.org/10.1186/s13634-019-0618-4
Journal volume & issue: Vol. 2019, no. 1
pp. 1 – 16

Abstract

Read online

Abstract The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.

Published in EURASIP Journal on Advances in Signal Processing

ISSN: 1687-6172 (Print); 1687-6180 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics
Website: https://asp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords