Isolated single sound lip-reading using a frame-based camera and event-based camera

Tatsuya Kanamaru; Taiki Arakane; Takeshi Saitoh

doi:10.3389/frai.2022.1070964

Frontiers in Artificial Intelligence (Jan 2023)

Isolated single sound lip-reading using a frame-based camera and event-based camera

Tatsuya Kanamaru,
Taiki Arakane,
Takeshi Saitoh

Affiliations

Tatsuya Kanamaru
Taiki Arakane
Takeshi Saitoh

DOI: https://doi.org/10.3389/frai.2022.1070964
Journal volume & issue: Vol. 5

Abstract

Read online

Unlike the conventional frame-based camera, the event-based camera detects changes in the brightness value for each pixel over time. This research work on lip-reading as a new application by the event-based camera. This paper proposes an event camera-based lip-reading for isolated single sound recognition. The proposed method consists of imaging from event data, face and facial feature points detection, and recognition using a Temporal Convolutional Network. Furthermore, this paper proposes a method that combines the two modalities of the frame-based camera and an event-based camera. In order to evaluate the proposed method, the utterance scenes of 15 Japanese consonants from 20 speakers were collected using an event-based camera and a video camera and constructed an original dataset. Several experiments were conducted by generating images at multiple frame rates from an event-based camera. As a result, the highest recognition accuracy was obtained in the image of the event-based camera at 60 fps. Moreover, it was confirmed that combining two modalities yields higher recognition accuracy than a single modality.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords