Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

Yorghos Voutos; Georgios Drakopoulos; Georgios Chrysovitsiotis; Zoi Zachou; Dimitris Kikidis; Efthymios Kyrodimos; Themis Exarchos

doi:10.3390/computers11030034

Computers (Feb 2022)

Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

Yorghos Voutos,
Georgios Drakopoulos,
Georgios Chrysovitsiotis,
Zoi Zachou,
Dimitris Kikidis,
Efthymios Kyrodimos,
Themis Exarchos

Affiliations

Yorghos Voutos: Department of Informatics, Ionian University, 49100 Corfu, Greece
Georgios Drakopoulos: Department of Informatics, Ionian University, 49100 Corfu, Greece
Georgios Chrysovitsiotis: Voice Clinic, Medical School, National and Kapodistrian University of Athens, 11527 Athens, Greece
Zoi Zachou: Voice Clinic, Medical School, National and Kapodistrian University of Athens, 11527 Athens, Greece
Dimitris Kikidis: Voice Clinic, Medical School, National and Kapodistrian University of Athens, 11527 Athens, Greece
Efthymios Kyrodimos: Voice Clinic, Medical School, National and Kapodistrian University of Athens, 11527 Athens, Greece
Themis Exarchos: Department of Informatics, Ionian University, 49100 Corfu, Greece

DOI: https://doi.org/10.3390/computers11030034
Journal volume & issue: Vol. 11, no. 3
p. 34

Abstract

Read online

Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized word prediction models which can reproduce the patient’s original voice. In this work we designed a multimodal approach based on audiovisual information from patients before loss-of-voice to develop a system for automated lip-reading in the Greek language. Data pre-processing methods, such as, lip-segmentation and frame-level sampling techniques were used to enhance the quality of the imaging data. Audio information was incorporated in the model to automatically annotate sets of frames as words. Recurrent neural networks were trained on four different video recordings to develop a robust word prediction model. The model was able to correctly identify test words in different time frames with 95% accuracy. To our knowledge, this is the first word prediction model that is trained to recognize words from video recordings in the Greek language.

Published in Computers

ISSN: 2073-431X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computers

About the journal

Abstract

Keywords