An Audio Data Representation for Traffic Acoustic Scene Recognition

Dazhi Jiang; Dongmin Huang; Youyi Song; Kaichao Wu; Huakang Lu; Quanquan Liu; Teng Zhou

doi:10.1109/ACCESS.2020.3027474

IEEE Access (Jan 2020)

An Audio Data Representation for Traffic Acoustic Scene Recognition

Dazhi Jiang,
Dongmin Huang,
Youyi Song,
Kaichao Wu,
Huakang Lu,
Quanquan Liu,
Teng Zhou

Affiliations

Dazhi Jiang: ORCiD; Department of Computer Science, College of Engineering, Shantou University, Shantou, China
Dongmin Huang: ORCiD; Department of Computer Science, College of Engineering, Shantou University, Shantou, China
Youyi Song: ORCiD; Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong
Kaichao Wu: Department of Computer Science, College of Engineering, Shantou University, Shantou, China
Huakang Lu: ORCiD; Department of Computer Science, College of Engineering, Shantou University, Shantou, China
Quanquan Liu: ORCiD; Department of Computer Science, College of Engineering, Shantou University, Shantou, China
Teng Zhou: ORCiD; Department of Computer Science, College of Engineering, Shantou University, Shantou, China

DOI: https://doi.org/10.1109/ACCESS.2020.3027474
Journal volume & issue: Vol. 8
pp. 177863 – 177873

Abstract

Read online

Acoustic scene recognition (ASR), recognizing acoustic environments given an audio recording of the scene, has a wide range of applications, e.g. robotic navigation and audio forensic. However, ASR remains challenging mainly due to the difficulty of representing audio data. In this article, we focus on traffic acoustic data. Traffic acoustic sense recognition provides complementary information to visual information of the scene; for example, it can be used to verify the visual perception result. The acoustic analysis and recognition, in consideration of its simple and convenient, can effectively enhance the perception ability which only applies visual information. We propose an audio data representation method to improve the traffic acoustic scene recognition accuracy. The proposed method employs the constant Q transform (CQT) and histogram of gradient (HOG) to transfer the one-dimensional audio signals into a time-frequency representation. We also propose two data representation mechanisms, called global and local feature selections, in order to select features that are able to describe the shape of time-frequency structures. We finally exploit the least absolute shrinkage and selection operator (LASSO) technique to further improve the recognition accuracy, by further selecting the most representative information for the recognition. We implemented extensive experiments, and the results show that the proposed method is effective, significantly outperforming the state-of-the-art methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords