EURASIP Journal on Audio, Speech, and Music Processing (Jun 2024)

Exploration of Whisper fine-tuning strategies for low-resource ASR

  • Yunpeng Liu,
  • Xukui Yang,
  • Dan Qu

DOI
https://doi.org/10.1186/s13636-024-00349-3
Journal volume & issue
Vol. 2024, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Limited data availability remains a significant challenge for Whisper’s low-resource speech recognition performance, falling short of practical application requirements. While previous studies have successfully reduced the recognition error rates of target language speech through fine-tuning, a comprehensive exploration and analysis of Whisper’s fine-tuning capabilities and the advantages and disadvantages of various fine-tuning strategies are still lacking. This paper aims to fill this gap by conducting comprehensive experimental exploration for Whisper’s low-resource speech recognition performance using five fine-tuning strategies with limited supervised data from seven low-resource languages. The results and analysis demonstrate that all fine-tuning strategies explored in this paper significantly enhance Whisper’s performance. However, different strategies vary in their suitability and practical effectiveness, highlighting the need for careful selection based on specific use cases and resources available.

Keywords