Informatics in Medicine Unlocked (Jan 2021)
A practical method for determining automated EEG interpretation software performance on continuous Video-EEG monitoring data
Abstract
Despite evidence to suggest that as many as 30% of hospitalized patients with altered mental status may be having subclinical seizures for which treatment may improve patient outcomes, the adoption of continuous video electroencephalography (EEG) monitoring (cvEEG) into non-academic centers has remained slow. With the continued progress in EEG networking technology and telemedicine, the traditional barriers to the use of cvEEG in community hospitals may soon be removed. This will lead to significant increases in the volume of cvEEG data that needs to be reviewed. The application of computer analytics to automate the process of cvEEG review has been ongoing for many years. One solution has been the use of graphic trends to allow more rapid review and assessment of the content of the cvEEG data over long periods of time. However, high false positive rates, the lack of specificity, and the requirement for frequent review of the raw cvEEG data to confirm the results has limited the use of this modality as a bedside monitoring tool. Thus, the development of truly automated cvEEG readers will be needed. Here we address the challenges of assessing the performance of such automated readers using the traditional core measures of performance: sensitivity, specificity, false positive/negative rates and diagnostic accuracy, and introduce an alternative approach to measuring and assessing performance that focuses on agreement. Our work demonstrates that traditional approaches to objectively assess agreement and performance of automated readers are difficult to apply and may not provide accurate estimates of the real-world performance of computer analytic interpretations of cvEEG data. We introduce our “Pure Gold” approach to the assessment of automated cvEEG reading tools. This novel approach uses compiled EEG segments representing a single target pattern to measure true error rates for the automated reader, visual graphical representation of automated reader outputs for comparison to human reviews of the same data, and agreement comparisons with correction for human variation. We suggest our approach represents a better method for performance assessment than traditional metrics of sensitivity and specificity.