Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network

Shashidhar Rudregowda; Sudarshan Patil Kulkarni; Gururaj H L; Vinayakumar Ravi; Moez Krichen

doi:10.3390/acoustics5010020

Acoustics (Mar 2023)

Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network

Shashidhar Rudregowda,
Sudarshan Patil Kulkarni,
Gururaj H L,
Vinayakumar Ravi,
Moez Krichen

Affiliations

Shashidhar Rudregowda: Department of Electronics and Communication Engineering, JSS Science and Technology University, Karnataka 570006, India
Sudarshan Patil Kulkarni: Department of Electronics and Communication Engineering, JSS Science and Technology University, Karnataka 570006, India
Gururaj H L: Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal 576104, India
Vinayakumar Ravi: Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar 34754, Saudi Arabia
Moez Krichen: Department of Information Technology, Faculty of Computer Science and Information Technology (FCSIT), Al-Baha University, Alaqiq 65779-7738, Saudi Arabia

DOI: https://doi.org/10.3390/acoustics5010020
Journal volume & issue: Vol. 5, no. 1
pp. 343 – 353

Abstract

Read online

Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process that poses various challenging tasks to human machine-based procedures. VSR methods clarify the tasks by using machine learning. Visual speech helps people who are hearing impaired, laryngeal patients, and are in a noisy environment. In this research, authors developed our dataset for the Kannada Language. The dataset contained five words, which are Avanu, Bagge, Bari, Guruthu, Helida, and these words are randomly chosen. The average duration of each video is 1 s to 1.2 s. The machine learning method is used for feature extraction and classification. Here, authors applied VGG16 Convolution Neural Network for our custom dataset, and relu activation function is used to get an accuracy of 91.90% and the recommended system confirms the effectiveness of the system. The proposed output is compared with HCNN, ResNet-LSTM, Bi-LSTM, and GLCM-ANN, and evidenced the effectiveness of the recommended system.

Published in Acoustics

ISSN: 2624-599X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Physics
Website: https://www.mdpi.com/journal/acoustics

About the journal

Abstract

Keywords