Robust Detection of Background Acoustic Scene in the Presence of Foreground Speech

Siyuan Song; Yanjue Song; Nilesh Madhu

doi:10.3390/app14020609

Applied Sciences (Jan 2024)

Robust Detection of Background Acoustic Scene in the Presence of Foreground Speech

Siyuan Song,
Yanjue Song,
Nilesh Madhu

Affiliations

Siyuan Song: IDLab, Department of Electronics and Information Systems, Ghent University—imec, 9000 Ghent, Belgium
Yanjue Song: IDLab, Department of Electronics and Information Systems, Ghent University—imec, 9000 Ghent, Belgium
Nilesh Madhu: IDLab, Department of Electronics and Information Systems, Ghent University—imec, 9000 Ghent, Belgium

DOI: https://doi.org/10.3390/app14020609
Journal volume & issue: Vol. 14, no. 2
p. 609

Abstract

Read online

The characterising sound required for the Acoustic Scene Classification (ASC) system is contained in the ambient signal. However, in practice, this is often distorted by e.g., foreground speech of the speakers in the surroundings. Previously, based on the iVector framework, we proposed different strategies to improve the classification accuracy when foreground speech is present. In this paper, we extend these methods to deep-learning (DL)-based ASC systems, for improving foreground speech robustness. ResNet models are proposed as the baseline, in combination with multi-condition training at different signal-to-background ratios (SBRs). For further robustness, we first investigate the noise-floor-based Mel-FilterBank Energies (NF-MFBE) as the input feature of the ResNet model. Next, speech presence information is incorporated within the ASC framework obtained from a speech enhancement (SE) system. As the speech presence information is time-frequency specific, it allows the network to learn to distinguish better between background signal regions and foreground speech. While the proposed modifications improve the performance of ASC systems when foreground speech is dominant, in scenarios with low-level or absent foreground speech, performance is slightly worse. Therefore, as a last consideration, ensemble methods are introduced, to integrate classification scores from different models in a weighted manner. The experimental study systematically validates the contribution of each proposed modification and, for the final system, it is shown that with the proposed input features and meta-learner, the classification accuracy is improved in all tested SBRs. Especially for SBRs of 20 dB, absolute improvements of up to 9% can be obtained.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords