Nature and Science of Sleep (May 2024)

Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review

  • Bechny M,
  • Monachino G,
  • Fiorillo L,
  • van der Meer J,
  • Schmidt MH,
  • Bassetti CLA,
  • Tzovara A,
  • Faraci FD

Journal volume & issue
Vol. Volume 16
pp. 555 – 572

Abstract

Read online

Michal Bechny,1,2 Giuliana Monachino,1,2 Luigi Fiorillo,2 Julia van der Meer,3 Markus H Schmidt,3,4 Claudio LA Bassetti,3 Athina Tzovara,1,3 Francesca D Faraci2 1Institute of Computer Science, University of Bern, Bern, Switzerland; 2Institute of Digital Technologies for Personalized Healthcare (Meditech), University of Applied Sciences and Arts of Southern Switzerland, Lugano, Switzerland; 3Department of Neurology, University Hospital of Bern, Bern, Switzerland; 4Ohio Sleep Medicine Institute, Dublin, OH, USACorrespondence: Michal Bechny, Institute of Digital Technologies for Personalized Healthcare, East Campus USI-SUPSI, Via la Santa 1, CH-6962 Lugano-Viganello, Lugano, Switzerland, Tel +41 (0)58 666 65 10, Email [email protected]: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain (ID) and out-of-domain (OOD) data, and considering subjects’ diagnoses.Patients and Methods: A total of 19,578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of an additional 8832 PSGs, covering a full spectrum of ages (0– 91 years) and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician.Results: U-Sleep demonstrated robust performance, with Cohen’s kappa (K) at 76.2% on ID and 73.8– 78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5– 85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve κ ≥ 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians’ workload, and facilitating near-perfect agreement.Conclusion: Inter-scorer variability limits the accuracy of the scoring algorithms to ~80%. By integrating an uncertainty estimation with U-Sleep, we enhance the review of predicted hypnograms, to align with the scoring taste of a responsible physician. Validated across ID and OOD data and various sleep-disorders, our approach offers a strategy to boost automated scoring tools’ usability in clinical settings.Keywords: automated sleep scoring, uncertainty quantification, explainable AI, polysomnography, sleep medicine

Keywords