MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition

Khan Abrar Shams; Md. Rafid Reaz; Mohammad Ryan Ur Rafi; Sanjida Islam; Md. Shahriar Rahman; Rafeed Rahman; Md. Tanzim Reza; Mohammad Zavid Parvez; Subrata Chakraborty; Biswajeet Pradhan; Abdullah Alamri

doi:10.1109/ACCESS.2024.3410837

IEEE Access (Jan 2024)

MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition

Khan Abrar Shams,
Md. Rafid Reaz,
Mohammad Ryan Ur Rafi,
Sanjida Islam,
Md. Shahriar Rahman,
Rafeed Rahman,
Md. Tanzim Reza,
Mohammad Zavid Parvez,
Subrata Chakraborty,
Biswajeet Pradhan,
Abdullah Alamri

Affiliations

Khan Abrar Shams: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Md. Rafid Reaz: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Mohammad Ryan Ur Rafi: ORCiD; Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Sanjida Islam: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Md. Shahriar Rahman: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Rafeed Rahman: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Md. Tanzim Reza: Department of Computer Science and Engineering, School of Data and Sciences, Brac University, Dhaka, Bangladesh
Mohammad Zavid Parvez: ORCiD; School of Computing, Mathematics and Engineering, Charles Sturt University, Bathurst, NSW, Australia
Subrata Chakraborty: ORCiD; School of Science and Technology, University of New England, Armidale, NSW, Australia
Biswajeet Pradhan: ORCiD; Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
Abdullah Alamri: Department of Geology and Geophysics, College of Science, King Saud University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3410837
Journal volume & issue: Vol. 12
pp. 83638 – 83657

Abstract

Read online

Sign language is the predominant mode of communication for individuals with auditory impairment. In Bangladesh, BdSL or Bangla Sign Language is widely used among the hearing-impaired population. However, because of the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras that continuously capture images, which a DL model then processes. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model attained the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords