Continuous sign language recognition based on hierarchical memory sequence network

Cuihong Xue; Jingli Jia; Ming Yu; Gang Yan; Yingchun Guo; Yuehao Liu

doi:10.1049/cvi2.12240

IET Computer Vision (Mar 2024)

Continuous sign language recognition based on hierarchical memory sequence network

Cuihong Xue,
Jingli Jia,
Ming Yu,
Gang Yan,
Yingchun Guo,
Yuehao Liu

Affiliations

Cuihong Xue: Technical College for the Deaf Tianjin University of Technology Tianjin China
Jingli Jia: School of Artificial Intelligence Hebei University of Technology Tianjin China
Ming Yu: School of Artificial Intelligence Hebei University of Technology Tianjin China
Gang Yan: School of Artificial Intelligence Hebei University of Technology Tianjin China
Yingchun Guo: School of Artificial Intelligence Hebei University of Technology Tianjin China
Yuehao Liu: School of Artificial Intelligence Hebei University of Technology Tianjin China

DOI: https://doi.org/10.1049/cvi2.12240
Journal volume & issue: Vol. 18, no. 2
pp. 247 – 259

Abstract

Read online

Abstract With the goal of solving the problem of feature extractors lacking strong supervision training and insufficient time information concerning single‐sequence model learning, a hierarchical sequence memory network with a multi‐level iterative optimisation strategy is proposed for continuous sign language recognition. This method uses the spatial‐temporal fusion convolution network (STFC‐Net) to extract the spatial‐temporal information of RGB and Optical flow video frames to obtain the multi‐modal visual features of a sign language video. Then, in order to enhance the temporal relationships of visual feature maps, the hierarchical memory sequence network is used to capture local utterance features and global context dependencies across time dimensions to obtain sequence features. Finally, the decoder decodes the final sentence sequence. In order to enhance the feature extractor, the authors adopted a multi‐level iterative optimisation strategy to fine‐tune STFC‐Net and the utterance feature extractor. The experimental results on the RWTH‐Phoenix‐Weather multi‐signer 2014 dataset and the Chinese sign language dataset show the effectiveness and superiority of this method.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords