Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

Neelma Naz; Hasan Sajid; Sara Ali; Osman Hasan; Muhammad Khurram Ehsan

doi:10.1109/ACCESS.2023.3247761

IEEE Access (Jan 2023)

Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

Neelma Naz,
Hasan Sajid,
Sara Ali,
Osman Hasan,
Muhammad Khurram Ehsan

Affiliations

Neelma Naz: ORCiD; National University of Sciences and Technology (NUST), Islamabad, Pakistan
Hasan Sajid: ORCiD; National University of Sciences and Technology (NUST), Islamabad, Pakistan
Sara Ali: ORCiD; National University of Sciences and Technology (NUST), Islamabad, Pakistan
Osman Hasan: ORCiD; National University of Sciences and Technology (NUST), Islamabad, Pakistan
Muhammad Khurram Ehsan: ORCiD; Faculty of Engineering Sciences, Bahria University Lahore Campus, Lahore, Pakistan

DOI: https://doi.org/10.1109/ACCESS.2023.3247761
Journal volume & issue: Vol. 11
pp. 19135 – 19147

Abstract

Read online

Sign language recognition (SLR) enables the deaf and speech-impaired community to integrate and communicate effectively with the rest of society. Word level or isolated SLR is a fundamental yet complex task with the main objective of using models to correctly recognize signed words. Sign language consists of very fast and complex hand, body, face movements, and mouthing cues that make the task very challenging. Several input modalities; RGB, optical Flow, RGB-D, and pose/skeleton have been proposed for SLR. However, the complexity of these modalities and the state-of-the-art (SOTA) methodologies tend to be exceedingly sophisticated and over-parametrized. In this paper, our focus is to use the hands and body poses as an input modality. One major problem in pose-based SLR is extracting the most valuable and distinctive features for all skeleton joints. In this regard, we propose an accurate, efficient, and lightweight pose-based pipeline leveraging a graph convolution network (GCN) along with residual connections and a bottleneck structure. The proposed architecture not only facilitates efficient learning during model training providing significantly improved accuracy scores but also alleviates computational complexity. With the proposed architecture in place, we are able to achieve improved accuracies on three different subsets of the WLASL dataset and the LSA-64 dataset. Our proposed model outperforms previous SOTA pose-based methods by providing a relative improvement of 8.91%, 27.62%, and 26.97% for WLASL-100, WLASL-300, and WLASL-1000 subsets. Moreover, our proposed model also outperforms previous SOTA appearance-based methods by providing a relative improvement of 2.65% and 5.15% for WLASL-300 and WLASL-1000 subsets. For the LSA-64 dataset, our model is able to achieve 100% test recognition accuracy. We are able to achieve this improved performance with far less computational cost as compared to existing appearance-based methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords