Jisuanji kexue yu tansuo (Dec 2021)

Review of Extracting Methods for Lip Visual Features

  • MA Jinlin, GONG Yuanwen, MA Ziping, CHEN Deguang, ZHU Yanbin, LIU Yuhao

DOI
https://doi.org/10.3778/j.issn.1673-9418.2106105
Journal volume & issue
Vol. 15, no. 12
pp. 2256 – 2275

Abstract

Read online

Current research on lip recognition focuses on improving recognition accuracy and studying features of multimodal inputs. However, little attention has been paid to improving the effectiveness of lip visual features. Lip visual information plays a key role in visual speech recognition and lip recognition. It is important when audio is destroyed or has no information. How to obtain accurate and effective lip visual features is one of the most difficult tasks in lip recognition. This paper reviews the latest research work on lip recognition in recent years from three aspects: lip dataset, traditional visual feature extraction methods, and in-depth learning methods for visual feature extraction. Firstly, this paper summarizes the dataset for lip recognition. The lip dataset is divided into two types: front view and multi-view. Further two types of datasets are summarized from their characteristics, limitations, and download addresses. Secondly, this paper introduces the traditional methods of lip visual feature extraction from the perspective of pixel point, shape and mixed features. The basic idea, network structure and features of each method are mainly introduced. In the deep learning method of lip visual feature extraction, the network structure, advantages and disadvantages of four deep learning methods are mainly introduced, such as 2D CNN, 3D CNN, 2D CNN combined with 3D CNN, and other neural networks. The performance of these methods on open datasets is compared. Finally, the challenges faced by lip visual feature extraction methods and future research trends are prospected.

Keywords