A CNN-based approach to identification of degradations in speech signals

Yuki Saishu; Amir Hossein Poorjam; Mads Græsbøll Christensen

doi:10.1186/s13636-021-00198-4

EURASIP Journal on Audio, Speech, and Music Processing (Feb 2021)

A CNN-based approach to identification of degradations in speech signals

Yuki Saishu,
Amir Hossein Poorjam,
Mads Græsbøll Christensen

Affiliations

Yuki Saishu: Audio Analysis Lab, CREATE, Aalborg University
Amir Hossein Poorjam: Audio Analysis Lab, CREATE, Aalborg University
Mads Græsbøll Christensen: Audio Analysis Lab, CREATE, Aalborg University

DOI: https://doi.org/10.1186/s13636-021-00198-4
Journal volume & issue: Vol. 2021, no. 1
pp. 1 – 10

Abstract

Read online

Abstract The presence of degradations in speech signals, which causes acoustic mismatch between training and operating conditions, deteriorates the performance of many speech-based systems. A variety of enhancement techniques have been developed to compensate the acoustic mismatch in speech-based applications. To apply these signal enhancement techniques, however, it is necessary to know prior information about the presence and the type of degradations in speech signals. In this paper, we propose a new convolutional neural network (CNN)-based approach to automatically identify the major types of degradations commonly encountered in speech-based applications, namely additive noise, nonlinear distortion, and reverberation. In this approach, a set of parallel CNNs, each detecting a certain degradation type, is applied to the log-mel spectrogram of audio signals. Experimental results using two different speech types, namely pathological voice and normal running speech, show the effectiveness of the proposed method in detecting the presence and the type of degradations in speech signals which outperforms the state-of-the-art method. Using the score weighted class activation mapping, we provide a visual analysis of how the network makes decision for identifying different types of degradation in speech signals by highlighting the regions of the log-mel spectrogram which are more influential to the target degradation.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords