IEEE Access (Jan 2023)
Event-Based Gesture and Facial Expression Recognition: A Comparative Analysis
Abstract
Event-based cameras are novel vision sensors that respond to local variations in intensity, generating asynchronous pixels, referred to as events, with low latency, high temporal resolution, and high dynamic range. These events contain information related to the spatio-temporal dynamics of a scene. Given the temporal nature of the asynchronous event stream, several authors have contributed to recognising deformable objects in motion, specifically gestures. However, another category of deformable objects, such as facial expressions, has yet to be adequately addressed. In this paper, we present a comprehensive review of two topics of interest in novel event-based cameras: gesture and facial expression recognition. For both tasks, we evaluate two existing state-of-the-art learning models, and also we use a third model that learns from temporal and spatial correlations of events. To this end, we evaluate a wide range of classification models across multiple scenarios, analysing: the time/event cut-off window of the sample, the number of samples per class for each database, the spatial resolution of the databases, amongst other factors. In the case of gesture recognition, we utilise existing databases, while in the case of facial expression recognition we have synthetically generated two completely new databases (based on two state-of-the-art image databases): e-CK+ and e-MMI, with promising results for the future of this area. Finally, we provide our contributions to the community, specifically the databases developed and used for this study.
Keywords