Multi-Modal Sleep Stage Classification With Two-Stream Encoder-Decoder

Zhao Zhang; Bor-Shyh Lin; Chih-Wei Peng; Bor-Shing Lin

doi:10.1109/tnsre.2024.3394738

IEEE Transactions on Neural Systems and Rehabilitation Engineering (Jan 2024)

Multi-Modal Sleep Stage Classification With Two-Stream Encoder-Decoder

Zhao Zhang,
Bor-Shyh Lin,
Chih-Wei Peng,
Bor-Shing Lin

Affiliations

Zhao Zhang: ORCiD; College of Mechanical and Electrical Engineering, Wuyi University, Wuyishan, Fujian, China
Bor-Shyh Lin: ORCiD; Institute of Imaging and Biomedical Photonics, National Yang Ming Chiao Tung University, Tainan, Taiwan
Chih-Wei Peng: ORCiD; School of Biomedical Engineering, College of Biomedical Engineering, and the School of Gerontology and Long-Term Care, College of Nursing, Taipei Medical University, Taipei, Taiwan
Bor-Shing Lin: ORCiD; Department of Computer Science and Information Engineering, National Taipei University, New Taipei City, Taiwan

DOI: https://doi.org/10.1109/tnsre.2024.3394738
Journal volume & issue: Vol. 32
pp. 2096 – 2105

Abstract

Read online

Sleep staging serves as a fundamental assessment for sleep quality measurement and sleep disorder diagnosis. Although current deep learning approaches have successfully integrated multimodal sleep signals, enhancing the accuracy of automatic sleep staging, certain challenges remain, as follows: 1) optimizing the utilization of multi-modal information complementarity, 2) effectively extracting both long- and short-range temporal features of sleep information, and 3) addressing the class imbalance problem in sleep data. To address these challenges, this paper proposes a two-stream encode-decoder network, named TSEDSleepNet, which is inspired by the depth sensitive attention and automatic multi-modal fusion (DSA2F) framework. In TSEDSleepNet, a two-stream encoder is used to extract the multiscale features of electrooculogram (EOG) and electroencephalogram (EEG) signals. And a self-attention mechanism is utilized to fuse the multiscale features, generating multi-modal saliency features. Subsequently, the coarser-scale construction module (CSCM) is adopted to extract and construct multi-resolution features from the multiscale features and the salient features. Thereafter, a Transformer module is applied to capture both long- and short-range temporal features from the multi-resolution features. Finally, the long- and short-range temporal features are restored with low-layer details and mapped to the predicted classification results. Additionally, the Lovász loss function is applied to alleviate the class imbalance problem in sleep datasets. Our proposed method was tested on the Sleep-EDF-39 and Sleep-EDF-153 datasets, and it achieved classification accuracies of 88.9% and 85.2% and Macro-F1 scores of 84.8% and 79.7%, respectively, thus outperforming conventional traditional baseline models. These results highlight the efficacy of the proposed method in fusing multi-modal information. This method has potential for application as an adjunct tool for diagnosing sleep disorders.

Published in IEEE Transactions on Neural Systems and Rehabilitation Engineering

ISSN: 1534-4320 (Print); 1558-0210 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Medical technology; Medicine: Therapeutics. Pharmacology
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=7333

About the journal

Abstract

Keywords