IEEE Access (Jan 2019)

Mispronunciation Detection Using Deep Convolutional Neural Network Features and Transfer Learning-Based Model for Arabic Phonemes

  • Faria Nazir,
  • Muhammad Nadeem Majeed,
  • Mustansar Ali Ghazanfar,
  • Muazzam Maqsood

DOI
https://doi.org/10.1109/ACCESS.2019.2912648
Journal volume & issue
Vol. 7
pp. 52589 – 52608

Abstract

Read online

Computer-assisted language learning (CALL) systems provide an automated framework to identify mispronunciation and give useful feedback. Traditionally, handcrafted acoustic-phonetic features are used to detect mispronunciation. From this line of research, this paper investigates the use of the deep convolutional neural network for mispronunciation detection of Arabic phonemes. We propose two methods with different techniques, i.e., convolutional neural network features (CNN_Features)-based technique and a transfer learning-based technique to detect mispronunciation detection. In the first method, we use deep CNN features to detect mispronunciation. We also extract features from different layers of CNN (layer4 to layer7) to train k-nearest neighbor (KNN), support vector machine (SVM), and neural network (NN) classifiers. In the transfer learning-based method, we trained the CNN using transfer learning to detect mispronunciation. To evaluate the performance of the system, we compare the results of these methods with baseline handcrafted features-based method for 28 Arabic phonemes. In the baseline method, we use the same classifiers; KNN, SVM, and NN to detect mispronunciation. The experimental results show that handcrafted_features method, CNN_features, and transfer learning-based method achieve an accuracy of 82%, 91.7%, and 92.2%, respectively. The performance analysis shows that transfer learning-based method outperforms handcrafted_features and transfer CNN_features-based methods and achieve an accuracy of 92.2%. The proposed transfer learning-based method also outperforms the state-of-art techniques in term of accuracy.

Keywords