Joint Motion Affinity Maps (JMAM) and Their Impact on Deep Learning Models for 3D Sign Language Recognition

P. V. V. Kishore; D. Anil Kumar; Rama Chaithanya Tanguturi; K. Srinivasarao; P. Praveen Kumar; D. Srihari

doi:10.1109/ACCESS.2024.3354775

IEEE Access (Jan 2024)

Joint Motion Affinity Maps (JMAM) and Their Impact on Deep Learning Models for 3D Sign Language Recognition

P. V. V. Kishore,
D. Anil Kumar,
Rama Chaithanya Tanguturi,
K. Srinivasarao,
P. Praveen Kumar,
D. Srihari

Affiliations

P. V. V. Kishore: ORCiD; Department of Electronics and Communication Engineering, Biomechanics and Vision Computing Research Center, Koneru Lakshmaiah Education Foundation (Deemed–to–be–University), Guntur, India
D. Anil Kumar: Department of Electronics and Communication Engineering, PACE Institute of Technology and Sciences, Ongole, India
Rama Chaithanya Tanguturi: Department of Computer Science and Engineering, PACE Institute of Technology and Sciences, Ongole, India
K. Srinivasarao: ORCiD; Department of Electronics and Communication Engineering, Dhanekula Institute of Engineering and Technology, Vijayawada, Andhra Pradesh, India
P. Praveen Kumar: Department of Information Technology, Vignan’s Institute of Information Technology, Duvvada, Visakhapatnam, India
D. Srihari: Department of Electronics and Communication Engineering, Sri Venkateswara College of Engineering and Technology, Chittoor, India

DOI: https://doi.org/10.1109/ACCESS.2024.3354775
Journal volume & issue: Vol. 12
pp. 11258 – 11275

Abstract

Read online

Previous works on 3D joint based feature representations of the human body as colour coded images (maps) were developed based on the joint positions, distances and angles or a combination of them for applications such as human action (sign language) recognition. These 3D joint maps have shown to singularly characterize both the spatial and temporal relationships between skeletal joints describing an action (sign). Consequently, the joint position and motion identification problem transformed into an image classification problem for 3D skeletal sign language (action) recognition. However, the previously proposed process of transforming 3D skeletal joints to colour coded maps has a negative proportionality component which resulted in a map with small pixel densities when the joint relationships are high. This drawback greatly impacts the learning of the classifiers to quantify the joint relationships within the colour coded maps. We hypothesized that a positive proportionality between joint motions and the corresponding maps would certainly improve classifiers performance. Hence, joint motion affinity maps(JMAM). These JMAMs use radial basis kernel on joint distances which assures a positive proportionality constant between joint motions and pixel densities of colour coded maps. To further improve the classification of 3D sign language, this work proposes congruent body part joints which results in motion directed JMAMs with maximally discriminating positive definite spatio temporal features. Finally, JMAMs are trained on the proposed multi-resolution convolutional neural network with spatial attention (MRCNNSA) architecture which produces an influencing result for the constructed 3D sign language data, KL3DISL. Consequently, online 3D datasets and standard deep learning models benchmark the proposed with respect to sign and action recognition. The results conclude that JMAMs with clustered joints characterize the subtle relationships which are otherwise difficult to be learned by a classifier.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords