Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

Yuxuan Ke; Andong Li; Chengshi Zheng; Renhua Peng; Xiaodong Li

doi:10.1186/s13636-021-00204-9

EURASIP Journal on Audio, Speech, and Music Processing (Apr 2021)

Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms

Yuxuan Ke,
Andong Li,
Chengshi Zheng,
Renhua Peng,
Xiaodong Li

Affiliations

Yuxuan Ke: Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences
Andong Li: Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences
Chengshi Zheng: Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences
Renhua Peng: Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences
Xiaodong Li: Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences

DOI: https://doi.org/10.1186/s13636-021-00204-9
Journal volume & issue: Vol. 2021, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Deep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords