Novel Spatio-Temporal Continuous Sign Language Recognition Using an Attentive Multi-Feature Network

Wisnu Aditya; Timothy K. Shih; Tipajin Thaipisutikul; Arda Satata Fitriajie; Munkhjargal Gochoo; Fitri Utaminingrum; Chih-Yang Lin

doi:10.3390/s22176452

Sensors (Aug 2022)

Novel Spatio-Temporal Continuous Sign Language Recognition Using an Attentive Multi-Feature Network

Wisnu Aditya,
Timothy K. Shih,
Tipajin Thaipisutikul,
Arda Satata Fitriajie,
Munkhjargal Gochoo,
Fitri Utaminingrum,
Chih-Yang Lin

Affiliations

Wisnu Aditya: Department of Computer Science and Information Engineering, National Central University, Taoyuan City 32001, Taiwan
Timothy K. Shih: Department of Computer Science and Information Engineering, National Central University, Taoyuan City 32001, Taiwan
Tipajin Thaipisutikul: Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom 73170, Thailand
Arda Satata Fitriajie: Department of Computer Science and Information Engineering, National Central University, Taoyuan City 32001, Taiwan
Munkhjargal Gochoo: Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain 15551, United Arab Emirates
Fitri Utaminingrum: Faculty of Computer Science, Brawijaya University, Malang 65145, Indonesia
Chih-Yang Lin: Department of Electrical Engineering, Yuan-Ze University, Taoyuan City 32003, Taiwan

DOI: https://doi.org/10.3390/s22176452
Journal volume & issue: Vol. 22, no. 17
p. 6452

Abstract

Read online

Given video streams, we aim to correctly detect unsegmented signs related to continuous sign language recognition (CSLR). Despite the increase in proposed deep learning methods in this area, most of them mainly focus on using only an RGB feature, either the full-frame image or details of hands and face. The scarcity of information for the CSLR training process heavily constrains the capability to learn multiple features using the video input frames. Moreover, exploiting all frames in a video for the CSLR task could lead to suboptimal performance since each frame contains a different level of information, including main features in the inferencing of noise. Therefore, we propose novel spatio-temporal continuous sign language recognition using the attentive multi-feature network to enhance CSLR by providing extra keypoint features. In addition, we exploit the attention layer in the spatial and temporal modules to simultaneously emphasize multiple important features. Experimental results from both CSLR datasets demonstrate that the proposed method achieves superior performance in comparison with current state-of-the-art methods by 0.76 and 20.56 for the WER score on CSL and PHOENIX datasets, respectively.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords