Mispronunciation Detection Using Deep Convolutional Neural Network Features and Transfer Learning-Based Model for Arabic Phonemes

Faria Nazir; Muhammad Nadeem Majeed; Mustansar Ali Ghazanfar; Muazzam Maqsood

doi:10.1109/ACCESS.2019.2912648

IEEE Access (Jan 2019)

Mispronunciation Detection Using Deep Convolutional Neural Network Features and Transfer Learning-Based Model for Arabic Phonemes

Faria Nazir,
Muhammad Nadeem Majeed,
Mustansar Ali Ghazanfar,
Muazzam Maqsood

Affiliations

Faria Nazir: ORCiD; Department of Software Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
Muhammad Nadeem Majeed: Department of Software Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
Mustansar Ali Ghazanfar: Department of Software Engineering, University of Engineering and Technology Taxila, Taxila, Pakistan
Muazzam Maqsood: ORCiD; Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Pakistan

DOI: https://doi.org/10.1109/ACCESS.2019.2912648
Journal volume & issue: Vol. 7
pp. 52589 – 52608

Abstract

Read online

Computer-assisted language learning (CALL) systems provide an automated framework to identify mispronunciation and give useful feedback. Traditionally, handcrafted acoustic-phonetic features are used to detect mispronunciation. From this line of research, this paper investigates the use of the deep convolutional neural network for mispronunciation detection of Arabic phonemes. We propose two methods with different techniques, i.e., convolutional neural network features (CNN_Features)-based technique and a transfer learning-based technique to detect mispronunciation detection. In the first method, we use deep CNN features to detect mispronunciation. We also extract features from different layers of CNN (layer4 to layer7) to train k-nearest neighbor (KNN), support vector machine (SVM), and neural network (NN) classifiers. In the transfer learning-based method, we trained the CNN using transfer learning to detect mispronunciation. To evaluate the performance of the system, we compare the results of these methods with baseline handcrafted features-based method for 28 Arabic phonemes. In the baseline method, we use the same classifiers; KNN, SVM, and NN to detect mispronunciation. The experimental results show that handcrafted_features method, CNN_features, and transfer learning-based method achieve an accuracy of 82%, 91.7%, and 92.2%, respectively. The performance analysis shows that transfer learning-based method outperforms handcrafted_features and transfer CNN_features-based methods and achieve an accuracy of 92.2%. The proposed transfer learning-based method also outperforms the state-of-art techniques in term of accuracy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords