STFE-Net: A Spatial-Temporal Feature Extraction Network for Continuous Sign Language Translation

Jiwei Hu; Yunfei Liu; Kin-Man Lam; Ping Lou

doi:10.1109/ACCESS.2023.3234743

IEEE Access (Jan 2023)

STFE-Net: A Spatial-Temporal Feature Extraction Network for Continuous Sign Language Translation

Jiwei Hu,
Yunfei Liu,
Kin-Man Lam,
Ping Lou

Affiliations

Jiwei Hu: ORCiD; School of Information Engineering, Wuhan University of Technology, Wuhan, China
Yunfei Liu: School of Information Engineering, Wuhan University of Technology, Wuhan, China
Kin-Man Lam: Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, China
Ping Lou: School of Information Engineering, Wuhan University of Technology, Wuhan, China

DOI: https://doi.org/10.1109/ACCESS.2023.3234743
Journal volume & issue: Vol. 11
pp. 46204 – 46217

Abstract

Read online

The main challenge of continuous sign language translation (CSLT) lies in the extraction of both discriminative spatial features and temporal features. In this paper, a spatial-temporal feature extraction network (STFE-Net) is proposed for CSLT, which optimally fuses spatial and temporal features, extracted by the spatial feature extraction network (SFE-Net) and the temporal feature extraction network (TFE-Net), respectively. SFE-Net performs pose estimation for the presenters in sign-language videos. Based on COCO-WholeBody, 133 key points are abbreviated to 53 key points, according to the characteristics of the sign language. High-resolution pose estimation is performed on the hands, along with the whole-body pose estimation, to obtain finer-grained hand features. The spatial features extracted by SFE-Net and the sign language words are then fed to TFE-Net, which is based on Transformer with relative position encoding. In this paper, a dataset for Chinese continuous sign language was created and used for evaluation. STFE-Net achieves Bilingual Evaluation Understudy (BLEU-1, BLEU-2, BLEU-3, BLEU-4) scores of 77.59, 75.62, 74.25, 72.14, respectively. Furthermore, our proposed STFE-Net was also evaluated on two public datasets, RWTH-Phoenix-Weather 2014T and CLS. The BLEU-1, BLEU-2, BLEU-3 and BLEU-4 scores achieved by our method on the former dataset are 48.22, 33.59, 26.41 and 22.45, respectively, and the corresponding scores are 61.54, 58.76, 57.93 and 57.52, respectively, on the latter dataset. Experiment results show that our model can achieve promising performance. If any reader needs the code or dataset, please email [email protected].

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords