Sensors (Jun 2020)
Spatio-Temporal Representation of an Electoencephalogram for Emotion Recognition Using a Three-Dimensional Convolutional Neural Network
Abstract
Emotion recognition plays an important role in the field of human–computer interaction (HCI). An electroencephalogram (EEG) is widely used to estimate human emotion owing to its convenience and mobility. Deep neural network (DNN) approaches using an EEG for emotion recognition have recently shown remarkable improvement in terms of their recognition accuracy. However, most studies in this field still require a separate process for extracting handcrafted features despite the ability of a DNN to extract meaningful features by itself. In this paper, we propose a novel method for recognizing an emotion based on the use of three-dimensional convolutional neural networks (3D CNNs), with an efficient representation of the spatio-temporal representations of EEG signals. First, we spatially reconstruct raw EEG signals represented as stacks of one-dimensional (1D) time series data to two-dimensional (2D) EEG frames according to the original electrode position. We then represent a 3D EEG stream by concatenating the 2D EEG frames to the time axis. These 3D reconstructions of the raw EEG signals can be efficiently combined with 3D CNNs, which have shown a remarkable feature representation from spatio-temporal data. Herein, we demonstrate the accuracy of the emotional classification of the proposed method through extensive experiments on the DEAP (a Dataset for Emotion Analysis using EEG, Physiological, and video signals) dataset. Experimental results show that the proposed method achieves a classification accuracy of 99.11%, 99.74%, and 99.73% in the binary classification of valence and arousal, and, in four-class classification, respectively. We investigate the spatio-temporal effectiveness of the proposed method by comparing it to several types of input methods with 2D/3D CNN. We then verify the best performing shape of both the kernel and input data experimentally. We verify that an efficient representation of an EEG and a network that fully takes advantage of the data characteristics can outperform methods that apply handcrafted features.
Keywords