Jisuanji kexue (Sep 2022)

Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion

  • ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong

DOI
https://doi.org/10.11896/jsjkx.210800026
Journal volume & issue
Vol. 49, no. 9
pp. 155 – 161

Abstract

Read online

Enabling computers to understand the expressions of signers has been a challenging task that requires considering not only the temporal and spatial information of sign language videos,but also the complexity of sign language grammar.In the continuous sign language recognition task,sign language words and sign language actions share a consistent order.In contrast,in the continuous sign language translation task,the generated natural language sentences have to conform to the spoken description,and the word order may not coincide with the action order.To enable more accurate learning of signers' expressions,this paper proposes a novel deep neural network for simultaneous sign language recognition and translation.In this scheme,we explore the effectiveness of different classical pre-trained convolutional neural networks,and different multilayer temporal attention score functions on continuous sign language recognition,combined with Transformer language model,to obtain continuous sign language translation conforming to the spoken description based on continuous sign language recognition.First,this method is assessed on the first large-scale complex background Chinese continuous sign language recognition and translation dataset Tslrt.The complex contextual environment and rich action expressions of signers in Tslrt dataset are used to train our neural network model through different comparison experiments,resulting in a series of benchmark results.The best WER are 4.8% and 5.1% on the tasks of continuous sign language recognition and translation,respectively.To further demonstrate the effectiveness of our method,experiments are conducted on another Chinese continuous sign language recognition dataset Chinese-CSL and compared with other 13 methods.The results show that the WER of our method reaches 1.8%,which proves the effectiveness of the proposed method.

Keywords