Correct Pronunciation Detection of the Arabic Alphabet Using Deep Learning

Nishmia Ziafat; Hafiz Farooq Ahmad; Iram Fatima; Muhammad Zia; Abdulaziz Alhumam; Kashif Rajpoot

doi:10.3390/app11062508

Applied Sciences (Mar 2021)

Correct Pronunciation Detection of the Arabic Alphabet Using Deep Learning

Nishmia Ziafat,
Hafiz Farooq Ahmad,
Iram Fatima,
Muhammad Zia,
Abdulaziz Alhumam,
Kashif Rajpoot

Affiliations

Nishmia Ziafat: COMSIP Lab, Department of Electronics, Quaid-I-Azam University, Islamabad 45320, Pakistan
Hafiz Farooq Ahmad: Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, P.O. Box 400, Al-Ahsa 31982, Saudi Arabia
Iram Fatima: Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, P.O. Box 400, Al-Ahsa 31982, Saudi Arabia
Muhammad Zia: COMSIP Lab, Department of Electronics, Quaid-I-Azam University, Islamabad 45320, Pakistan
Abdulaziz Alhumam: Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, P.O. Box 400, Al-Ahsa 31982, Saudi Arabia
Kashif Rajpoot: School of Electrical Engineering and Computer Science, NUST, Islamabad 44000, Pakistan

DOI: https://doi.org/10.3390/app11062508
Journal volume & issue: Vol. 11, no. 6
p. 2508

Abstract

Read online

Automatic speech recognition for Arabic has its unique challenges and there has been relatively slow progress in this domain. Specifically, Classic Arabic has received even less research attention. The correct pronunciation of the Arabic alphabet has significant implications on the meaning of words. In this work, we have designed learning models for the Arabic alphabet classification based on the correct pronunciation of an alphabet. The correct pronunciation classification of the Arabic alphabet is a challenging task for the research community. We divide the problem into two steps, firstly we train the model to recognize an alphabet, namely Arabic alphabet classification. Secondly, we train the model to determine its quality of pronunciation, namely Arabic alphabet pronunciation classification. Due to the less availability of audio data of this kind, we had to collect audio data from the experts, and novices for our model’s training. To train these models, we extract pronunciation features from audio data of the Arabic alphabet using mel-spectrogram. We have employed a deep convolution neural network (DCNN), AlexNet with transfer learning, and bidirectional long short-term memory (BLSTM), a type of recurrent neural network (RNN), for the classification of the audio data. For alphabet classification, DCNN, AlexNet, and BLSTM achieve an accuracy of 95.95%, 98.41%, and 88.32%, respectively. For Arabic alphabet pronunciation classification, DCNN, AlexNet, and BLSTM achieve an accuracy of 97.88%, 99.14%, and 77.71%, respectively.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords