Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

Vitalii Brydinskyi; Dmytro Sabodashko; Yuriy Khoma; Michal Podpora; Alexander Konovalov; Volodymyr Khoma

doi:10.1109/ACCESS.2024.3443811

IEEE Access (Jan 2024)

Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

Vitalii Brydinskyi,
Dmytro Sabodashko,
Yuriy Khoma,
Michal Podpora,
Alexander Konovalov,
Volodymyr Khoma

Affiliations

Vitalii Brydinskyi: Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Lviv, Ukraine
Dmytro Sabodashko: ORCiD; Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Lviv, Ukraine
Yuriy Khoma: ORCiD; Institute of Computer Technologies, Automation and Metrology, Lviv Polytechnic National University, Lviv, Ukraine
Michal Podpora: ORCiD; Department of Computer Science, Opole University of Technology, Opole, Poland
Alexander Konovalov: Vidby AG, Risch-Rotkreuz, Switzerland
Volodymyr Khoma: ORCiD; Department of Control Engineering, Opole University of Technology, Opole, Poland

DOI: https://doi.org/10.1109/ACCESS.2024.3443811
Journal volume & issue: Vol. 12
pp. 116649 – 116656

Abstract

Read online

Automatic speech recognition (ASR) systems have become increasingly popular in recent years due to their ability to convert spoken language into text. Nonetheless, despite their widespread use, existing speaker-independent ASR systems frequently encounter challenges related to variations in speaking styles, accents, and vocal characteristics, leading to potential recognition inaccuracies. This study delves into the feasibility of personalized ASR systems that adapt to the unique voice attributes of individual speakers, thereby enhancing recognition accuracy. It provides an overview of our methodology, focusing on the design, development, and evaluation of both speaker-independent and personalized ASR systems. The dataset used included diverse speakers selected from three extensive datasets: TedLIUM-3, CommonVoice, and GoogleVoice, demonstrating the capability of our methodology to accommodate various accents and challenges of both natural and synthetic voices. In terms of signal classification and interpretation, the personalized model eclipsed the speaker-independent variant, registering an enhancement of up to ~3% for natural voices and ~10% for synthetic voices in recognition accuracy for individual speakers. Our findings demonstrate that personalized ASR systems can significantly improve the accuracy of speech recognition for individual speakers and highlight the importance of adapting ASR models to individual voices.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords