Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information

Yanji Jiang; Dingxu Guo; Lan Wang; Haitao Zhang; Hao Dong; Youli Qiu; Huiwen Zou

doi:10.1007/s40747-024-01463-7

Complex & Intelligent Systems (May 2024)

Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information

Yanji Jiang,
Dingxu Guo,
Lan Wang,
Haitao Zhang,
Hao Dong,
Youli Qiu,
Huiwen Zou

Affiliations

Yanji Jiang: School of Software, Liaoning Technical University
Dingxu Guo: School of Software, Liaoning Technical University
Lan Wang: Shantou Polytechnic
Haitao Zhang: Shantou Polytechnic
Hao Dong: Suzhou Automotive Research Institute, Tsinghua University
Youli Qiu: School of Software, Liaoning Technical University
Huiwen Zou: Shantou Polytechnic

DOI: https://doi.org/10.1007/s40747-024-01463-7
Journal volume & issue: Vol. 10, no. 4
pp. 5653 – 5668

Abstract

Read online

Abstract Sound event detection involves identifying sound categories in audio and determining when they start and end. However, in real-life situations, sound events are usually not isolated. When one sound event occurs, there are often other related sound events that take place as co-occurrences or successive occurrences. The timing relationship of sound events can reflect their characteristics. Therefore, this paper proposes a sound event detection method for traffic scenes based on a graph convolutional network, which considers this timing relationship as a form of multimodal information. The proposed method involves using the acoustic event window method to obtain co-occurrences or successive occurrences of relationship information in the sound signal while filtering out possible noise relationship information. This information is then represented as a graphical structure. Next, the graph convolutional neural network is improved to balance relationship weights between neighbors and itself and to avoid excessive smoothing. It is used to learn the relationship information in the graph structure. Finally, the convolutional recurrent neural network is used to learn the acoustic feature information of sound events, and the relationship information of sound events is obtained by multi-modal fusion to enhance the performance of sound event detection. The experimental results show that using multi-modal information with the proposed method can effectively improve the performance of the model and enhance the perception ability of smart cars in their surrounding environment while driving.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords