IEEE Access (Jan 2023)
A Systematic Review of Electroencephalography Open Datasets and Their Usage With Deep Learning Models
Abstract
Data are the main headache for machine learning, both because of their varied nature and their limited availability. The medical field brings together both situations: tables, images, text, or signals that are difficult to acquire due to the number of patients, the complexity and time of acquisition, or ethical constraints. The existence of open datasets is the best option for researchers in this field. Electroencephalograms are a good example of this situation. This paper identifies the primary open datasets of electroencephalography tests and how they are used in deep learning models. The aim is to provide structured information that can be consulted by researchers in the field (both physicians and computer scientists) to know which datasets are available, which characteristics they have, or which deep learning models could be applied to them. The process followed the PRISMA methodology for systematic reviews applying different inclusion and exclusion criteria to obtain a set of high-quality papers on which the data sets used were analyzed. The databases included in the searches were Scopus, PubMed, Web of Science (WOS), Science Direct, IEEE Explorer, and SpringerLink. In total, 37 papers were selected which included 30 datasets that have been considered. Then, the DL models used in the papers and the different characteristics of the datasets have been statistically analyzed by obtaining different measures and graphs. The most relevant conclusions are the widespread use of convolutional neural networks (the less innovative among the different models) as the main tool for EEG data analysis. Against this position, we found the use of hybrid models and the family of RNNs as techniques to use in cases of brain stimuli, classification of levels of fatigue, and diagnosis of diseases. Related to the datasets’ features, we demonstrate the difficulty in compiling this data due to the number of tests and that the minimum of channels or sampling frequency recommended to obtain good accuracies in the model should be studied.
Keywords