A Novel Jointly Optimized Cooperative DAE-DNN Approach Based on a New Multi-Target Step-Wise Learning for Speech Enhancement

Matin Pashaian; Sanaz Seyedin; Seyed Mohammad Ahadi

doi:10.1109/ACCESS.2023.3250820

IEEE Access (Jan 2023)

A Novel Jointly Optimized Cooperative DAE-DNN Approach Based on a New Multi-Target Step-Wise Learning for Speech Enhancement

Matin Pashaian,
Sanaz Seyedin,
Seyed Mohammad Ahadi

Affiliations

Matin Pashaian: ORCiD; Department of Electrical Engineering, Speech Processing Research Laboratory, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Sanaz Seyedin: ORCiD; Department of Electrical Engineering, Speech Processing Research Laboratory, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Seyed Mohammad Ahadi: Department of Electrical Engineering, Speech Processing Research Laboratory, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran

DOI: https://doi.org/10.1109/ACCESS.2023.3250820
Journal volume & issue: Vol. 11
pp. 21669 – 21685

Abstract

Read online

In this paper, we present a new supervised speech enhancement approach based on the cooperative structure of deep autoencoders (DAEs) as generative models and deep neural networks (DNN). The DAE is used as a nonlinear alternative to nonnegative matrix factorization (NMF) for the extraction of harmonic structures and encoded features of the noise, clean and noisy signals, and a DNN is deployed as a nonlinear mapper. We introduce a deep network imitating NMF in a nonlinear manner to overcome the problems of a simple linear model, such as performance degradation in non-stationary environments. Compared to combinatorial NMF and DNN methods, we perform all the decomposition, enhancement, and reconstruction processes in a nonlinear framework via a suitable cooperative structure of encoder, DNN, and decoders, and jointly optimize them. We also propose a supervised hierarchical multi-target training approach, performed in two steps, such that the DNN not only predicts the low-level encoded features as primary targets but it also predicts the high-level actual spectral signals as secondary targets. The first step acts as a pretraining for the second step which improves the learning strategy. Moreover, to exploit a more discriminative model for noise reduction, a DNN-based noise classification and fusion strategy (NCF) is also proposed. The experiments on TIMIT dataset reveal that the proposed methods outperform the previous approaches and achieve an average perceptual evaluation of speech quality (PESQ) improvement of up to about 0.3 for speech enhancement.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords