Sensors (Jul 2022)

One Model is Not Enough: Ensembles for Isolated Sign Language Recognition

  • Marek Hrúz,
  • Ivan Gruber,
  • Jakub Kanis,
  • Matyáš Boháček,
  • Miroslav Hlaváč,
  • Zdeněk Krňoul

DOI
https://doi.org/10.3390/s22135043
Journal volume & issue
Vol. 22, no. 13
p. 5043

Abstract

Read online

In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler.

Keywords