Mamba vision models: Automated American sign language recognition
Ali Salem Altaher,
Chiron Bang,
Bader Alsharif,
Ahmed Altaher,
Munid Alanazi,
Hasan Altaher,
Hanqi Zhuang
Affiliations
Ali Salem Altaher
Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA; College of Medicine, Ibn Sina University of Medical and Pharmaceutical Science, Baghdad, Iraq; Corresponding author.
Chiron Bang
Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA
Bader Alsharif
Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA; Department of Computer Science and Engineering, College of Telecommunication and Information, Technical and Vocational Training Corporation, Riyadh, 12464, Saudi Arabia
Ahmed Altaher
Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA; Electronic Computer Center, Al-Nahrain University, Jadriya, Baghdad, Iraq
Munid Alanazi
Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA; Business Informatics Department, College of Business, King Khalid University, Abha, Saudi Arabia
Hasan Altaher
Information and Communication Technology Department, Baghdad Institute of Technology, Middle Technical University, Baghdad, Iraq
Hanqi Zhuang
Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, 33431, FL, USA
Historically, individuals with hearing impairments have faced significant challenges in effective communication due to the lack of adequate resources. Recent technological advancements have spurred the development of innovative tools aimed at enhancing the quality of life for those with hearing disabilities. This research focuses on the application of Vision Mamba Models for classifying hand gestures representing the American Sign Language (ASL) alphabet, with a detailed comparative analysis of its performance against 13 deep learning architectures. The Vision Mamba Models, namely, Vision Mamba and Remote Sensing Mamba, were trained on a substantial dataset comprising 87,000 images of ASL hand gestures, and through iterative fine-tuning of their architectural parameters, the models’ accuracy and performance were optimized. Experimental results demonstrated that the Mamba Vision Models outperformed all other models previously examined in this context scoring exceptional accuracy rate that exceeded 99.98%, with less architectural complexity. These findings highlight the potential of deep learning technologies, particularly the Mamba vision models, in advancing assistive technologies, offering a sophisticated and highly accurate tool for interpreting ASL hand gestures and promising improved communication and accessibility for individuals with hearing impairments.