Jisuanji kexue (Mar 2023)

Sound Event Joint Estimation Method Based on Three-dimension Convolution

  • MEI Pengcheng, YANG Jibin, ZHANG Qiang, HUANG Xiang

DOI
https://doi.org/10.11896/jsjkx.220500259
Journal volume & issue
Vol. 50, no. 3
pp. 191 – 198

Abstract

Read online

Sound event localization and detection(SELD) is widely used in monitoring and anomaly detection tasks.Deep learning methods represented by convolutional recurrent neural networks(CRNN) can be realized to improve the performance of SELD.In order to improve the system localization and detection performance,a method based on 3D Convolution feature extraction,called SELD3Dnet,is proposed.The amplitude and phase spectra of input multi-channel acoustic signal are calculated,and the deep feature representation is extracted by multiple 3D Convolution modules.Recurrent neural networks and the fully connected layers are adopted to estimate the type of sound events and their localization.In processing multi-channel acoustic signals,three-dimensional(3D) convolution can carry out convolution calculation of time,frequency and signal channel simultaneously,so that the correlation between signal channels can be exploited to the maximum extent.Comparative experiments are conducted on TUT2018 dataset and TAU2019 dataset,and the results show that the comprehensive performance of the proposed method is significantly improved on TUT2018 REAL and TAU2019 MREAL datasets.The F1 index of acoustic event detection on TUT2018 REAL dataset significan-tly improves by 13.9% and frame accuracy by 21.1%,while the F1 index on TAU2019 MREAL dataset significantly improves by 10.8% and frame accuracy by 14.4%.It is verified that the proposed method can effectively overcome the influence of reverberation existing in real-life scenes.

Keywords