Code-Mixed Street Address Recognition and Accent Adaptation for Voice-Activated Navigation Services

Syed Meesam Raza Naqvi; Muhammad Ali Tahir; Kamran Javed; Hassan Aqeel Khan; Ali Raza; Zubair Saeed

doi:10.1109/ACCESS.2024.3496617

IEEE Access (Jan 2024)

Code-Mixed Street Address Recognition and Accent Adaptation for Voice-Activated Navigation Services

Syed Meesam Raza Naqvi,
Muhammad Ali Tahir,
Kamran Javed,
Hassan Aqeel Khan,
Ali Raza,
Zubair Saeed

Affiliations

Syed Meesam Raza Naqvi: ORCiD; FEMTO-ST Institute, Université de Franche-Comté/SUPMICROTECH-ENSMM, Besançon, France
Muhammad Ali Tahir: ORCiD; School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad, Pakistan
Kamran Javed: ORCiD; Department of Computer Engineering, National University of Technology (NUTECH), Islamabad, Pakistan
Hassan Aqeel Khan: Department of Applied Artificial Intelligence and Robotics, School of Computer Science and Digital Technologies, Aston University, Birmingham, U.K.
Ali Raza: ORCiD; Department of Computer Engineering, National University of Technology (NUTECH), Islamabad, Pakistan
Zubair Saeed: ORCiD; Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA

DOI: https://doi.org/10.1109/ACCESS.2024.3496617
Journal volume & issue: Vol. 12
pp. 168393 – 168411

Abstract

Read online

This study presents the development of a real-time application-specific Automatic Speech Recognition (ASR) system for voice-activated navigation services. The system is designed to recognize Urdu-English code-mixed street addresses, which is challenging due to their complex nature and structure, especially in under-resourced languages such as Urdu. Two separate corpora are collected for ASR system development: Unicode Urdu consisting of general Urdu recordings of around 61.82 hours by 144 speakers and Roman Urdu-English code-mixed Addresses of around 16.89 hours by 20 speakers. The Unicode Urdu data is developed to provide acoustic models with general language understanding and code-mixed street addresses to provide code-mixing or switching coverage. The hybrid ASR system employed in this study plays a crucial role in addressing the multifaceted challenges of low-resource settings (only 16.89 hours of task-specific data), especially in the context of Urdu-English code-switching. The study compares various acoustic models, with mixed Time Delay Neural Network and Long Short-Term Memory (TDNN-LSTM) performing best with a Word Error Rate (WER), Character Error Rate (CER), and Sentence Error Rate (SER) of 4.02%, 0.8%, and 15.14% respectively, on random street addresses. In addition to testing street addresses, we performed accent-based and manual decoding testing on the developed ASR system. Results indicate the need to develop and deploy custom ASR systems for better accent adaptation and application-specific coverage. The developed ASR system is integrated into the TPL Maps (https://tplmaps.com/) mobile application. It is Pakistan’s first Large Vocabulary Continuous Speech Recognition (LVCSR) real-time system to provide Urdu-based voice-activated navigation services.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords