A method of multimodal machine sign language translation for natural human-computer interaction

Alexandr A. Axyonov; Ildar A. Kagirov; Dmitry A. Ryumin

doi:10.17586/2226-1494-2022-22-3-585-593

Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Jun 2022)

A method of multimodal machine sign language translation for natural human-computer interaction

Alexandr A. Axyonov,
Ildar A. Kagirov,
Dmitry A. Ryumin

Affiliations

Alexandr A. Axyonov: ORCiD; Junior Researcher, Saint Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), Saint Petersburg, 199178, Russian Federation, sc 57203963345
Ildar A. Kagirov: ORCiD; Scientific Researcher, Saint Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), Saint Petersburg, 199178, Russian Federation, sc 25121369400
Dmitry A. Ryumin: ORCiD; PhD, Senior Researcher, Saint Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), Saint Petersburg, 199178, Russian Federation, sc 57191960214

DOI: https://doi.org/10.17586/2226-1494-2022-22-3-585-593
Journal volume & issue: Vol. 22, no. 3
pp. 585 – 593

Abstract

Read online

This paper aims to investigate the possibility of robustness enhancement as applied to an automatic system for isolated signs and sign languages recognition, through the use of the most informative spatiotemporal visual features. The authors present a method for the automatic recognition of gestural information, based on an integrated neural network model, which analyses spatiotemporal visual features: 2D and 3D distances between the palm and the face; the area of the hand and the face intersection; hand configuration; the gender and the age of signers. A 3DResNet-18-based neural network model for hand configuration data extraction was elaborated. Deepface software platform neural network models were embedded in the method in order to extract gender and age-related data. The proposed method was tested on the data from the multimodal corpus of sign language elements TheRuSLan, with the accuracy of 91.14 %. The results of this investigation not only improve the accuracy and robustness of machine sign language translation, but also enhance the naturalness of human-machine interaction in general. Besides that, the results have application in various fields of social services, medicine, education and robotics, as well as different public service centers.

Published in Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki

ISSN: 2226-1494 (Print); 2500-0373 (Online)
Publisher: Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
Country of publisher: Russian Federation
LCC subjects: Science: Physics: Optics. Light; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ntv.ifmo.ru/en/english.htm

About the journal

Abstract

Keywords