Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach

Abdullah Baihan; Ahmed I. Alutaibi; Mohammed Alshehri; Sunil Kumar Sharma

doi:10.1038/s41598-024-76174-7

Scientific Reports (Oct 2024)

Sign language recognition using modified deep learning network and hybrid optimization: a hybrid optimizer (HO) based optimized CNNSa-LSTM approach

Abdullah Baihan,
Ahmed I. Alutaibi,
Mohammed Alshehri,
Sunil Kumar Sharma

Affiliations

Abdullah Baihan: Computer Science Department, Community College, King Saud University
Ahmed I. Alutaibi: Department of Computer Engineering, College of Computer and Information Sciences, Majmaah University
Mohammed Alshehri: Department of Information Technology, College of Computer and Information Sciences, Majmaah University
Sunil Kumar Sharma: Department of Information Systems, College of Computer and Information Sciences, Majmaah University

DOI: https://doi.org/10.1038/s41598-024-76174-7
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Speech impairment limits a person’s capacity for oral and auditory communication. Improvements in communication between the deaf and the general public can be progressed by a real-time sign language detector. Recent studies have contributed to make progress in motion and gesture identification processes using Deep Learning (DL) methods and computer vision. But the development of static and dynamic sign language recognition (SLR) models is still a challenging area of research. The difficulty is in obtaining an appropriate model that addresses the challenges of continuous signs that are independent of the signer. Different signers’ speeds, durations, and many other factors make it challenging to create a model with high accuracy and continuity. This study mainly focused on SLR using a modified DL and hybrid optimization approach. Notably, spatial and geometric-based features are extracted via the Visual Geometry Group 16 (VGG16), and motion features are extracted using the optical flow approach. A new DL model, CNNSa-LSTM, is a combination of a Convolutional Neural Network (CNN), Self-Attention (SA), and Long-Short-Term Memory (LSTM) to identify sign language. This model is developed for feature extraction by combining CNNs for spatial analysis with SA mechanisms for focusing on relevant features, while LSTM effectively models temporal dependencies. The proposed CNNSa-LSTM model enhances performance in tasks involving complex, sequential data, such as sign language processing. Besides, a Hybrid Optimizer (HO) is proposed using the Hippopotamus Optimization Algorithm (HOA) and the Pathfinder Algorithm (PFA). The proposed model has been implemented in Python, and it has been evaluated over the existing models in terms of accuracy (98.7%), sensitivity (98.2%), precision (98.5%), Word Error Rate (WER) (0.131), Sign Error Rate (SER) (0.114), and Normalized Discounted Cumulative Gain (NDCG) (98%) as well. The proposed model has recorded the highest accuracy of 98.7%.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords