IEEE Access (Jan 2024)

Exploring the Effectiveness of Feature Reduction and Kernel-Based Matching for Query-by- Example Spoken Term Detection Using CNN

  • Manisha Naik Gaonkar,
  • Veena Thenkanidiyoor,
  • Dileep Aroor Dinesh,
  • H. Muralikrishna

DOI
https://doi.org/10.1109/ACCESS.2024.3520605
Journal volume & issue
Vol. 12
pp. 194462 – 194474

Abstract

Read online

Query-by-example spoken term detection (QbE-STD) refers to the search for an audio query in a repository of audio utterances. A common approach for QbE-STD involves computing a matching matrix between the feature representations of the query and the reference utterance and deciding the relevance of the reference utterance to the query based on the computed matching matrix. The time required to compute the matching matrix is crucial since a matching matrix must be computed between a query and every reference utterance. This time depends on the number of feature representations in the query and reference utterance. Feature reduction is a technique that reduces the number of feature representations to reduce the time required to compute a matching matrix. In this study, we propose to explore feature reduction in combination with kernel-based matching of reduced feature representation for query and reference utterances. We propose to decide the relevance of a reference utterance using a convolutional neural network (CNN) based classifier on the matching matrix. We demonstrate that the proposed approach not only results in a reduction in search time but also increases the accuracy of QbE-STD.

Keywords