Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Mohammad Ali Humayun; Ibrahim A. Hameed; Syed Muslim Shah; Sohaib Hassan Khan; Irfan Zafar; Saad Bin Ahmed; Junaid Shuja

doi:10.3390/app9091956

Applied Sciences (May 2019)

Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Mohammad Ali Humayun,
Ibrahim A. Hameed,
Syed Muslim Shah,
Sohaib Hassan Khan,
Irfan Zafar,
Saad Bin Ahmed,
Junaid Shuja

Affiliations

Mohammad Ali Humayun: Department of Electrical Engineering, University of Engineering and Technology Peshawar, Institute of Communication Technologies (ICT) Campus, Islamabad 44000, Pakistan
Ibrahim A. Hameed: Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology, 6001 Alesund, Norway
Syed Muslim Shah: Department of Electrical Engineering, University of Engineering and Technology Peshawar, Institute of Communication Technologies (ICT) Campus, Islamabad 44000, Pakistan
Sohaib Hassan Khan: Department of Electrical Engineering, University of Engineering and Technology Peshawar, Institute of Communication Technologies (ICT) Campus, Islamabad 44000, Pakistan
Irfan Zafar: Department of Electrical Engineering, University of Engineering and Technology Peshawar, Institute of Communication Technologies (ICT) Campus, Islamabad 44000, Pakistan
Saad Bin Ahmed: Malaysia-Japan International Institute of Technology (M-JIIT), Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
Junaid Shuja: Department of Computer Sciences, COSMATS University Islamabad, Abbottabad Campus, Abbottabad 22010, Pakistan

DOI: https://doi.org/10.3390/app9091956
Journal volume & issue: Vol. 9, no. 9
p. 1956

Abstract

Read online

Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords