Towards trustworthy seizure onset detection using workflow notes

Khaled Saab; Siyi Tang; Mohamed Taha; Christopher Lee-Messer; Christopher Ré; Daniel L. Rubin

doi:10.1038/s41746-024-01008-9

npj Digital Medicine (Feb 2024)

Towards trustworthy seizure onset detection using workflow notes

Khaled Saab,
Siyi Tang,
Mohamed Taha,
Christopher Lee-Messer,
Christopher Ré,
Daniel L. Rubin

Affiliations

Khaled Saab: Department of Electrical Engineering, Stanford University
Siyi Tang: Department of Electrical Engineering, Stanford University
Mohamed Taha: Department of Neurology, Stanford University
Christopher Lee-Messer: Department of Child Neurology, Stanford University
Christopher Ré: Department of Computer Science, Stanford University
Daniel L. Rubin: Department of Biomedical Data Science, Radiology, and Medicine, Stanford University

DOI: https://doi.org/10.1038/s41746-024-01008-9
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 9

Abstract

Read online

Abstract A major barrier to deploying healthcare AI is trustworthiness. One form of trustworthiness is a model’s robustness across subgroups: while models may exhibit expert-level performance on aggregate metrics, they often rely on non-causal features, leading to errors in hidden subgroups. To take a step closer towards trustworthy seizure onset detection from EEG, we propose to leverage annotations that are produced by healthcare personnel in routine clinical workflows—which we refer to as workflow notes—that include multiple event descriptions beyond seizures. Using workflow notes, we first show that by scaling training data to 68,920 EEG hours, seizure onset detection performance significantly improves by 12.3 AUROC (Area Under the Receiver Operating Characteristic) points compared to relying on smaller training sets with gold-standard labels. Second, we reveal that our binary seizure onset detection model underperforms on clinically relevant subgroups (e.g., up to a margin of 6.5 AUROC points between pediatrics and adults), while having significantly higher FPRs (False Positive Rates) on EEG clips showing non-epileptiform abnormalities (+19 FPR points). To improve model robustness to hidden subgroups, we train a multilabel model that classifies 26 attributes other than seizures (e.g., spikes and movement artifacts) and significantly improve overall performance (+5.9 AUROC points) while greatly improving performance among subgroups (up to +8.3 AUROC points) and decreasing false positives on non-epileptiform abnormalities (by 8 FPR points). Finally, we find that our multilabel model improves clinical utility (false positives per 24 EEG hours) by a factor of 2×.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal