A Lip Reading Method Based on 3D Convolutional Vision Transformer

Huijuan Wang; Gangqiang Pu; Tingyu Chen

doi:10.1109/ACCESS.2022.3193231

IEEE Access (Jan 2022)

A Lip Reading Method Based on 3D Convolutional Vision Transformer

Huijuan Wang,
Gangqiang Pu,
Tingyu Chen

Affiliations

Huijuan Wang: ORCiD; Department of Computer, North China Institute of Aerospace Engineering, Langfang, China
Gangqiang Pu: Department of Computer, North China Institute of Aerospace Engineering, Langfang, China
Tingyu Chen: Department of Computer, North China Institute of Aerospace Engineering, Langfang, China

DOI: https://doi.org/10.1109/ACCESS.2022.3193231
Journal volume & issue: Vol. 10
pp. 77205 – 77212

Abstract

Read online

Lip reading has received increasing attention in recent years. It judges the content of speech based on the movement of the speaker’s lips. The rapid development of deep learning has promoted progress in lip reading. However, due to lip reading needs to process the information of continuous video frames, it is necessary to consider the correlation information between adjacent images and the correlation between long-distance images. Moreover, lip reading recognition mainly focuses on the subtle changes of lips and their surrounding environment, and it is necessary to extract the subtle features of small-size images. Therefore, the performance of machine lip reading is generally not high, and the research progress is slow. In order to improve the performance of machine lip reading, we propose a lip reading method based on 3D convolutional vision transformer (3DCvT), which combines vision transformer and 3D convolution to extract the spatio-temporal feature of continuous images, and take full advantage of the properties of convolutions and transformers to extract local and global features from continuous images effectively. The extracted features are then sent to a Bidirectional Gated Recurrent Unit (BiGRU) for sequence modeling. We proved the effectiveness of our method on large-scale lip reading datasets LRW and LRW-1000 and achieved state-of-the-art performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords