Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network

Sakshi Dua; Sethuraman Sambath Kumar; Yasser Albagory; Rajakumar Ramalingam; Ankur Dumka; Rajesh Singh; Mamoon Rashid; Anita Gehlot; Sultan S. Alshamrani; Ahmed Saeed AlGhamdi

doi:10.3390/app12126223

Applied Sciences (Jun 2022)

Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network

Sakshi Dua,
Sethuraman Sambath Kumar,
Yasser Albagory,
Rajakumar Ramalingam,
Ankur Dumka,
Rajesh Singh,
Mamoon Rashid,
Anita Gehlot,
Sultan S. Alshamrani,
Ahmed Saeed AlGhamdi

Affiliations

Sakshi Dua: School of Computer Applications, Lovely Professional University, Jalandhar 144402, India
Sethuraman Sambath Kumar: School of Computer Applications, Lovely Professional University, Jalandhar 144402, India
Yasser Albagory: Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
Rajakumar Ramalingam: Department of Computer Science & Technology, MITS, Madanapalle 517325, India
Ankur Dumka: Department of Computer Science & Engineering, Women Institute of Technology, Dehradun 248007, India
Rajesh Singh: Department of Research and Development, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun 248007, India
Mamoon Rashid: Department of Computer Engineering, Faculty of Science and Technology, Vishwakarma University, Pune 411048, India
Anita Gehlot: Department of Research and Development, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun 248007, India
Sultan S. Alshamrani: Department of Information Technology, College of Computer and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
Ahmed Saeed AlGhamdi: Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

DOI: https://doi.org/10.3390/app12126223
Journal volume & issue: Vol. 12, no. 12
p. 6223

Abstract

Read online

Deep learning-based machine learning models have shown significant results in speech recognition and numerous vision-related tasks. The performance of the present speech-to-text model relies upon the hyperparameters used in this research work. In this research work, it is shown that convolutional neural networks (CNNs) can model raw and tonal speech signals. Their performance is on par with existing recognition systems. This study extends the role of the CNN-based approach to robust and uncommon speech signals (tonal) using its own designed database for target research. The main objective of this research work was to develop a speech-to-text recognition system to recognize the tonal speech signals of Gurbani hymns using a CNN. Further, the CNN model, with six layers of 2DConv, 2DMax Pooling, and 256 dense layer units (Google’s TensorFlow service) was also used in this work, as well as Praat for speech segmentation. Feature extraction was enforced using the MFCC feature extraction technique, which extracts standard speech features and features of background music as well. Our study reveals that the CNN-based method for identifying tonal speech sentences and adding instrumental knowledge performs better than the existing and conventional approaches. The experimental results demonstrate the significant performance of the present CNN architecture by providing an 89.15% accuracy rate and a 10.56% WER for continuous and extensive vocabulary sentences of speech signals with different tones.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords