Nature and Science of Sleep (Dec 2021)

Confidence-Based Framework Using Deep Learning for Automated Sleep Stage Scoring

  • Hong JK,
  • Lee T,
  • Delos Reyes RD,
  • Hong J,
  • Tran HH,
  • Lee D,
  • Jung J,
  • Yoon IY

Journal volume & issue
Vol. Volume 13
pp. 2239 – 2250

Abstract

Read online

Jung Kyung Hong,1,2,* Taeyoung Lee,3,* Roben Deocampo Delos Reyes,4 Joonki Hong,3,4 Hai Hong Tran,4 Dongheon Lee,4 Jinhwan Jung,4 In-Young Yoon1,2 1Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam, Korea; 2Seoul National University College of Medicine, Seoul, Korea; 3Korea Advanced Institute of Science and Technology, Daejeon, Korea; 4Asleep Inc., Seoul, Korea*These authors contributed equally to this workCorrespondence: In-Young YoonDepartment of Psychiatry, Seoul National University Bundang Hospital, 82, Gumi-ro 173beon-gil, Bundang-gu, Seongnam-si, Gyeonggi-do, 463-707, KoreaTel +82-31-787-7433Fax +82-31-787-4058Email [email protected] JungR&D Division, Asleep Inc, Asleep, 15, Teheran-ro 82-gil, Gangnam-gu, Seoul, KoreaTel +82-10-6228-7137Email [email protected] Objectives: Automated sleep stage scoring is not yet vigorously used in practice because of the black-box nature and the risk of wrong predictions. The objective of this study was to introduce a confidence-based framework to detect the possibly wrong predictions that would inform clinicians about which epochs would require a manual review and investigate the potential to improve accuracy for automated sleep stage scoring.Methods: We used 702 polysomnography studies from a local clinical dataset (SNUBH dataset) and 2804 from an open dataset (SHHS dataset) for experiments. We adapted the state-of-the-art TinySleepNet architecture to train the classifier and modified the ConfidNet architecture to train an auxiliary confidence model. For the confidence model, we developed a novel method, Dropout Correct Rate (DCR), and the performance of it was compared with other existing methods.Results: Confidence estimates (0.754) reflected accuracy (0.758) well in general. The best performance for differentiating correct and wrong predictions was shown when using the DCR method (AUROC: 0.812) compared to the existing approaches which largely failed to detect wrong predictions. By reviewing only 20% of epochs that received the lowest confidence values, the overall accuracy of sleep stage scoring was improved from 76% to 87%. For patients with reduced accuracy (ie, individuals with obesity or severe sleep apnea), the possible improvement range after applying confidence estimation was even greater.Conclusion: To the best of our knowledge, this is the first study applying confidence estimation on automated sleep stage scoring. Reliable confidence estimates by the DCR method help screen out most of the wrong predictions, which would increase the reliability and interpretability of automated sleep stage scoring.Keywords: confidence estimation, deep learning, electroencephalography, polysomnography, sleep stages, accuracy improvement

Keywords