Proceedings of the XXth Conference of Open Innovations Association FRUCT (Sep 2020)
Comparative Assessment of Data Augmentation for Semi-Supervised Polyphonic Sound Event Detection
Abstract
In the context of audio ambient intelligence systems in Smart Buildings, polyphonic Sound Event Detection aims at detecting, localizing and classifying any sound event recorded in a room. Today, most of models are based on Deep Learning, requiring large databases to be trained. We propose a CRNN system exploiting unlabeled data with semi-supervised learning based on the ""Mean teacher"" method, in combination with data augmentation to overcome the limited size of the training dataset and to further improve the performances. This model was submitted to the challenge DCASE 2019 and was ranked second. In the present study, several conventional solutions of data augmentation are compared : time or frequency shifting, and background noise addition. It is shown that data augmentation with time shifting and noise addition, in combination with class-dependent median filtering, improves the performance by 9%, leading to an event-based F1-score of 43.2% with DCASE 2019 validation set. However, tools which have been used up to now for data augmentation generally rely on a coarse modelling (i.e. random variation of data) of intra-class variability observed in real life. It is wondered whether acoustic knowledge in the design of augmentation methods would be advantageous. A physics-inspired approach is outlined for future work.
Keywords