IEEE Access (Jan 2024)

Contextual Biasing for End-to-End Chinese ASR

  • Kai Zhang,
  • Qiuxia Zhang,
  • Chung-Che Wang,
  • Jyh-Shing Roger Jang

DOI
https://doi.org/10.1109/ACCESS.2024.3424260
Journal volume & issue
Vol. 12
pp. 92960 – 92975

Abstract

Read online

The end-to-end speech recognition approach exhibits higher robustness compared to conventional methods, enhancing recognition accuracy across diverse contexts. However, due to the absence of an independent language model, it struggles to identify vocabulary beyond the training data, thus impacting the recognition of certain specific terms. Adapting to various scenarios necessitates a pivot towards specific domains. This study, based on the CATSLU dataset, constructed two tasks for Chinese contextual biasing, targeting both proper nouns and mixed-domain sentences. Additionally, it explored four methods of contextual biasing at different stages within the speech recognition process: pre-recognition, within the model, decoding, and post-processing stages. Experimental results indicate that all biasing methods to some extent improved the recognition efficacy of the speech recognition model within specific domains.

Keywords