Synthetic data generation techniques for training deep acoustic siren identification networks

Stefano Damiano; Benjamin Cramer; Andre Guntoro; Toon van Waterschoot

doi:10.3389/frsip.2024.1358532

Frontiers in Signal Processing (Jul 2024)

Synthetic data generation techniques for training deep acoustic siren identification networks

Stefano Damiano,
Benjamin Cramer,
Andre Guntoro,
Toon van Waterschoot

Affiliations

Stefano Damiano: STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, Leuven, Belgium
Benjamin Cramer: Robert Bosch GmbH, Renningen, Germany
Andre Guntoro: Robert Bosch GmbH, Renningen, Germany
Toon van Waterschoot: STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, Leuven, Belgium

DOI: https://doi.org/10.3389/frsip.2024.1358532
Journal volume & issue: Vol. 4

Abstract

Read online

Acoustic sensing has been widely exploited for the early detection of harmful situations in urban environments: in particular, several siren identification algorithms based on deep neural networks have been developed and have proven robust to the noisy and non-stationary urban acoustic scene. Although high classification accuracy can be achieved when training and evaluating on the same dataset, the cross-dataset performance of such models remains unexplored. To build robust models that generalize well to unseen data, large datasets that capture the diversity of the target sounds are needed, whose collection is generally expensive and time consuming. To overcome this limitation, in this work we investigate synthetic data generation techniques for training siren identification models. To obtain siren source signals, we either collect from public sources a small set of stationary, recorded siren sounds, or generate them synthetically. We then simulate source motion, acoustic propagation and Doppler effect, and finally combine the resulting signal with background noise. This way, we build two synthetic datasets used to train three different convolutional neural networks, then tested on real-world datasets unseen during training. We show that the proposed training strategy based on the use of recorded source signals and synthetic acoustic propagation performs best. In particular, this method leads to models that exhibit a better generalization ability, as compared to training and evaluating in a cross-dataset setting. Moreover, the proposed method loosens the data collection requirement and is entirely built using publicly available resources.

Published in Frontiers in Signal Processing

ISSN: 2673-8198 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://www.frontiersin.org/journals/signal-processing

About the journal

Abstract

Keywords