Methodological Issues in Evaluating Machine Learning Models for EEG Seizure Prediction: Good Cross-Validation Accuracy Does Not Guarantee Generalization to New Patients

Sina Shafiezadeh; Gian Marco Duma; Giovanni Mento; Alberto Danieli; Lisa Antoniazzi; Fiorella Del Popolo Cristaldi; Paolo Bonanni; Alberto Testolin

doi:10.3390/app13074262

Applied Sciences (Mar 2023)

Methodological Issues in Evaluating Machine Learning Models for EEG Seizure Prediction: Good Cross-Validation Accuracy Does Not Guarantee Generalization to New Patients

Sina Shafiezadeh,
Gian Marco Duma,
Giovanni Mento,
Alberto Danieli,
Lisa Antoniazzi,
Fiorella Del Popolo Cristaldi,
Paolo Bonanni,
Alberto Testolin

Affiliations

Sina Shafiezadeh: Department of General Psychology, University of Padova, 35131 Padova, Italy
Gian Marco Duma: Epilepsy and Clinical Neurophysiology Unit, Scientific Institute, IRCCS E. Medea, 31015 Conegliano, Italy
Giovanni Mento: Department of General Psychology, University of Padova, 35131 Padova, Italy
Alberto Danieli: Epilepsy and Clinical Neurophysiology Unit, Scientific Institute, IRCCS E. Medea, 31015 Conegliano, Italy
Lisa Antoniazzi: Epilepsy and Clinical Neurophysiology Unit, Scientific Institute, IRCCS E. Medea, 31015 Conegliano, Italy
Fiorella Del Popolo Cristaldi: Department of General Psychology, University of Padova, 35131 Padova, Italy
Paolo Bonanni: Epilepsy and Clinical Neurophysiology Unit, Scientific Institute, IRCCS E. Medea, 31015 Conegliano, Italy
Alberto Testolin: Department of General Psychology, University of Padova, 35131 Padova, Italy

DOI: https://doi.org/10.3390/app13074262
Journal volume & issue: Vol. 13, no. 7
p. 4262

Abstract

Read online

There is an increasing interest in applying artificial intelligence techniques to forecast epileptic seizures. In particular, machine learning algorithms could extract nonlinear statistical regularities from electroencephalographic (EEG) time series that can anticipate abnormal brain activity. The recent literature reports promising results in seizure detection and prediction tasks using machine and deep learning methods. However, performance evaluation is often based on questionable randomized cross-validation schemes, which can introduce correlated signals (e.g., EEG data recorded from the same patient during nearby periods of the day) into the partitioning of training and test sets. The present study demonstrates that the use of more stringent evaluation strategies, such as those based on leave-one-patient-out partitioning, leads to a drop in accuracy from about 80% to 50% for a standard eXtreme Gradient Boosting (XGBoost) classifier on two different data sets. Our findings suggest that the definition of rigorous evaluation protocols is crucial to ensure the generalizability of predictive models before proceeding to clinical trials.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords