Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition

Mona A. Azim; Wedad Hussein; Nagwa L. Badr

doi:10.1109/ACCESS.2023.3302257

IEEE Access (Jan 2023)

Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition

Mona A. Azim,
Wedad Hussein,
Nagwa L. Badr

Affiliations

Mona A. Azim: ORCiD; Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Wedad Hussein: Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Nagwa L. Badr: ORCiD; Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

DOI: https://doi.org/10.1109/ACCESS.2023.3302257
Journal volume & issue: Vol. 11
pp. 91173 – 91183

Abstract

Read online

Owing to the linguistic richness of the Arabic language, which contains more than 6000 roots, building a reliable Arabic language model for Arabic speech recognition systems faces many challenges. This paper introduces a language model free Arabic automatic speech recognition system for Modern Standard Arabic based on an end-to-end-based Deep Speech architecture developed by Mozilla. The proposed model uses a character-level sequence-to-sequence model to map the character alignment produced by the recognizer model onto the corresponding words. The developed system outperformed recent studies on single-speaker and multi-speaker Arabic speech recognition using two different state-of-the-art datasets. The first was the Arabic Multi-Genre Broadcast (MGB2) corpus with 1200 h of audio data from multiple speakers. The system achieved a new milestone in the MGB2 challenge with a word error rate (WER) of 3.2, outperforming related work using the same corpus with a word error reduction of 17%. An additional experiment with a 7-hour Saudi Accent Single Speaker Corpus (SASSC) was used to build an additional model for single male speaker-based Arabic speech recognition using the same proposed network architecture. The single-speaker model outperformed related experiments with a WER of 4.25 with a relative improvement of 33.8%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords