A CNN Sound Classification Mechanism Using Data Augmentation

Hung-Chi Chu; Young-Lin Zhang; Hao-Chu Chiang

doi:10.3390/s23156972

Sensors (Aug 2023)

A CNN Sound Classification Mechanism Using Data Augmentation

Hung-Chi Chu,
Young-Lin Zhang,
Hao-Chu Chiang

Affiliations

Hung-Chi Chu: Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 41349, Taiwan
Young-Lin Zhang: Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 41349, Taiwan
Hao-Chu Chiang: Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 41349, Taiwan

DOI: https://doi.org/10.3390/s23156972
Journal volume & issue: Vol. 23, no. 15
p. 6972

Abstract

Read online

Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords