Complex & Intelligent Systems (May 2024)
Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information
Abstract
Abstract Sound event detection involves identifying sound categories in audio and determining when they start and end. However, in real-life situations, sound events are usually not isolated. When one sound event occurs, there are often other related sound events that take place as co-occurrences or successive occurrences. The timing relationship of sound events can reflect their characteristics. Therefore, this paper proposes a sound event detection method for traffic scenes based on a graph convolutional network, which considers this timing relationship as a form of multimodal information. The proposed method involves using the acoustic event window method to obtain co-occurrences or successive occurrences of relationship information in the sound signal while filtering out possible noise relationship information. This information is then represented as a graphical structure. Next, the graph convolutional neural network is improved to balance relationship weights between neighbors and itself and to avoid excessive smoothing. It is used to learn the relationship information in the graph structure. Finally, the convolutional recurrent neural network is used to learn the acoustic feature information of sound events, and the relationship information of sound events is obtained by multi-modal fusion to enhance the performance of sound event detection. The experimental results show that using multi-modal information with the proposed method can effectively improve the performance of the model and enhance the perception ability of smart cars in their surrounding environment while driving.
Keywords