qArI: A Hybrid CTC/Attention-Based Model for Quran Recitation Recognition Using Bidirectional LSTMP in an End-to-End Architecture

Sumayya Alfadhli; Hajar Alharbi; Asma Cherif

doi:10.1109/ACCESS.2024.3425273

IEEE Access (Jan 2024)

qArI: A Hybrid CTC/Attention-Based Model for Quran Recitation Recognition Using Bidirectional LSTMP in an End-to-End Architecture

Sumayya Alfadhli,
Hajar Alharbi,
Asma Cherif

Affiliations

Sumayya Alfadhli: ORCiD; Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Hajar Alharbi: Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Asma Cherif: Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3425273
Journal volume & issue: Vol. 12
pp. 95762 – 95777

Abstract

Read online

The accurate speech recognition of the Holy Quran is crucial for maintaining the traditional recitation styles and pronunciations, which helps in preserving the authenticity of the Quranic teachings and ensuring their accurate transmission across generations. Though the application of freshly developed models to spoken and written Arabic and non-Arabic speech recognition has yielded highly accurate results, research on Holy Quran is still in its early levels. Indeed, speech recognition of the Holy Quran presents several challenges, including language complexity and the absence of a comprehensive dataset. This research aims to improve the accuracy of speech recognition models for the recital of the Holy Quran. A new dataset called comprehensive Quranic dataset version 1 (CQDV1) is created to serves the HQSR field. The dataset is publicly available for use by other researchers and includes recitations of the entire Quran (114 sura, recited by 35 reciters), based on Hafs from Asim narrative.The study explores the development of a speech recognition model for the accurate recital of the Holy Quran. The model combines a connectionist temporal classification (CTC)/attention loss function with a Bidirectional Long Short-Term Memory with projections (BLSTMP) architecture and a token-based recurrent neural network language model (RNNLM) using CQDV1 dataset. The results achieved were a token error rate (TER) of 6.4%, a word error rate (WER) of 10.4%, and a sentence error rate (SER) of 55.3% with $\lambda =0.2$ .

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords