A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function and Region-Aware Convolution

Salinna Abdullah; Majid Zamani; Andreas Demosthenous

doi:10.1109/ACCESS.2022.3228744

IEEE Access (Jan 2022)

A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function and Region-Aware Convolution

Salinna Abdullah,
Majid Zamani,
Andreas Demosthenous

Affiliations

Salinna Abdullah: ORCiD; Department of Electronic and Electrical Engineering, University College London (UCL), London, U.K.
Majid Zamani: ORCiD; Department of Electronic and Electrical Engineering, University College London (UCL), London, U.K.
Andreas Demosthenous: ORCiD; Department of Electronic and Electrical Engineering, University College London (UCL), London, U.K.

DOI: https://doi.org/10.1109/ACCESS.2022.3228744
Journal volume & issue: Vol. 10
pp. 130657 – 130671

Abstract

Read online

Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named ‘CNN-AFD’) using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., ‘region-aware’) while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at −5, 0 and 5 dB signal-to-noise ratios (SNRs).

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords