A Study on pose-based deep learning models for gloss-free Sign Language Translation

Pedro Dal Bianco; Gastón Ríos; Waldo Hasperué; Oscar Stanchi; Facundo Quiroga; Franco Ronchetti

doi:10.24215/16666038.24.e09

Journal of Computer Science and Technology (Oct 2024)

A Study on pose-based deep learning models for gloss-free Sign Language Translation

Pedro Dal Bianco,
Gastón Ríos,
Waldo Hasperué,
Oscar Stanchi,
Facundo Quiroga,
Franco Ronchetti

Affiliations

Pedro Dal Bianco: III-LIDI, Facultad de Informática, UNLP
Gastón Ríos: Instituto de Investigación en Informática LIDI, Universidad Nacional de La Plata, Argentina
Waldo Hasperué: Instituto de Investigación en Informática LIDI, Universidad Nacional de La Plata, Argentina
Oscar Stanchi: Instituto de Investigación en Informática LIDI, Universidad Nacional de La Plata, Argentina
Facundo Quiroga: Instituto de Investigación en Informática LIDI, Universidad Nacional de La Plata, Argentina
Franco Ronchetti: Instituto de Investigación en Informática LIDI, Universidad Nacional de La Plata, Argentina

DOI: https://doi.org/10.24215/16666038.24.e09
Journal volume & issue: Vol. 24, no. 2

Abstract

Read online

Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, different grammars and lack of data. Currently, many SLT models rely on intermediate gloss annotations as outputs or latent priors. Glosses can help models to correctly segment and align signs to better understand the video. How- ever, the use of glosses comes with significant limitations, since obtaining annotations is quite difficult. Therefore, scaling gloss-based models to millions of samples remains impractical, specially considering the scarcity of sign language datasets. In a similar fashion, many models use video data that requires larger models which typically only work on high end GPUs, and are less invariant to signers appearance and context. In this work we propose a gloss-free pose-based SLT model. Using the extracted pose as feature allow fora sign significant reduction in the dimensionality of the data and the size of the model. We evaluate the state of the art, compare available models and develop a keypoint-based Transformer model for gloss-free. SLT, trained on RWTH-Phoenix, a standard dataset for benchmarking SLT models alongside GSL, a simpler laboratory-made Greek Sign Language dataset.

Published in Journal of Computer Science and Technology

ISSN: 1666-6046 (Print); 1666-6038 (Online)
Publisher: Postgraduate Office, School of Computer Science, Universidad Nacional de La Plata
Country of publisher: Argentina
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://journal.info.unlp.edu.ar

About the journal

Abstract

Keywords