Technologies (Oct 2024)
SWL-LSE: A Dataset of Health-Related Signs in Spanish Sign Language with an ISLR Baseline Method
Abstract
Progress in automatic sign language recognition and translation has been hindered by the scarcity of datasets available for the training of machine learning algorithms, a challenge that is even more acute for languages with smaller signing communities, such as Spanish. In this paper, we introduce a dataset of 300 isolated signs in Spanish Sign Language, collected online via a web application with contributions from 124 participants, resulting in a total of 8000 instances. This dataset, which is openly available, includes keypoints extracted using MediaPipe Holistic. The goal of this paper is to describe the construction and characteristics of the dataset and to provide a baseline classification method using a spatial–temporal graph convolutional network (ST-GCN) model, encouraging the scientific community to improve upon it. The experimental section offers a comparative analysis of the method’s performance on the new dataset, as well as on two other well-known datasets. The dataset, code, and web app used for data collection are freely available, and the web app can also be used to test classifier performance on-line in real-time.
Keywords