IEEE Access (Jan 2021)

Matching Pursuit and Sparse Coding for Auditory Representation

  • Dung Kim Tran,
  • Masashi Unoki

DOI
https://doi.org/10.1109/ACCESS.2021.3135011
Journal volume & issue
Vol. 9
pp. 167084 – 167095

Abstract

Read online

Previous studies have revealed that by mimicking the neural activity patterns of the auditory periphery to obtain perceptual features of speech signals, the resultant auditory representation is beneficial to speech-coding and pattern-analysis applications in comparison with spectrogram and spikegram representations. However, current solutions use outdated techniques such as the Bark scale and gammatone basis to decompose speech signals. We propose a method of using more physiological accurate techniques such as the equivalent rectangular bandwidth scale, gammachirp basis, and auditory masking effects of gammachirp kernels. Our experimental results indicate that the auditory representation created with our proposed method requires the lowest bitrate (1066 coefficients per second on average) to achieve similar perceptual evaluation scores (0.89 PEMO-Q and 3.27 PESQ scores) compared with spectrogram and spikegram representations. The proposed method also provides the highest matching accuracy with a pattern-matching algorithm.

Keywords