IEEE Access (Jan 2023)

Development of Parametric Filter Banks for Sound Feature Extraction

  • Xiangyu Cai,
  • Sunwoo Ko

DOI
https://doi.org/10.1109/ACCESS.2023.3321798
Journal volume & issue
Vol. 11
pp. 109856 – 109867

Abstract

Read online

A kind of learnable parametric filter banks is proposed in this paper. Parametric filter banks refer to selecting learnable parameters from the original filter banks and learning a parameter filter banks that adapts to the current dataset through the learning ability of a neural network. We use three types of filter banks, including the popular Mel filter banks, the Gammatone filter banks that mimics the response of the human auditory filter in the cochlea, and our own Gaussian filter banks. The performance evaluation of parametric filter banks is conducted on a speech recognition dataset called Audio-MNIST which contains the spoken digit pronunciation and a self-created news speech dataset called Ten Languages which include ten different language countries. Comparative experiments are conducted on both Convolutional Neural Network (CNN) and Full-Connected Neural Network (FCNN) for classification. The experimental results show that the parametric filter banks outperforms the original filter banks in the comparative experiment, and the parametric Gammatone filter banks achieves the highest accuracy of 98.77% and 92.14% on the Audio MNIST dataset and test data in ten languages. In order to further confirm the performance of the model, the number of class data in the dataset is different. We also use weighted average F1-score as the evaluation metric, with a maximum of 0.99 and 0.92.

Keywords