BMC Medical Informatics and Decision Making (Oct 2023)

Application and interpretation of machine learning models in predicting the risk of severe obstructive sleep apnea in adults

  • Yewen Shi,
  • Yitong Zhang,
  • Zine Cao,
  • Lina Ma,
  • Yuqi Yuan,
  • Xiaoxin Niu,
  • Yonglong Su,
  • Yushan Xie,
  • Xi Chen,
  • Liang Xing,
  • Xinhong Hei,
  • Haiqin Liu,
  • Shinan Wu,
  • Wenle Li,
  • Xiaoyong Ren

DOI
https://doi.org/10.1186/s12911-023-02331-z
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Obstructive sleep apnea (OSA) is a globally prevalent disease with a complex diagnostic method. Severe OSA is associated with multi-system dysfunction. We aimed to develop an interpretable machine learning (ML) model for predicting the risk of severe OSA and analyzing the risk factors based on clinical characteristics and questionnaires. Methods This was a retrospective study comprising 1656 subjects who presented and underwent polysomnography (PSG) between 2018 and 2021. A total of 23 variables were included, and after univariate analysis, 15 variables were selected for further preprocessing. Six types of classification models were used to evaluate the ability to predict severe OSA, namely logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), bootstrapped aggregating (Bagging), and multilayer perceptron (MLP). All models used the area under the receiver operating characteristic curve (AUC) was calculated as the performance metric. We also drew SHapley Additive exPlanations (SHAP) plots to interpret predictive results and to analyze the relative importance of risk factors. An online calculator was developed to estimate the risk of severe OSA in individuals. Results Among the enrolled subjects, 61.47% (1018/1656) were diagnosed with severe OSA. Multivariate LR analysis showed that 10 of 23 variables were independent risk factors for severe OSA. The GBM model showed the best performance (AUC = 0.857, accuracy = 0.766, sensitivity = 0.798, specificity = 0.734). An online calculator was developed to estimate the risk of severe OSA based on the GBM model. Finally, waist circumference, neck circumference, the Epworth Sleepiness Scale, age, and the Berlin questionnaire were revealed by the SHAP plot as the top five critical variables contributing to the diagnosis of severe OSA. Additionally, two typical cases were analyzed to interpret the contribution of each variable to the outcome prediction in a single patient. Conclusions We established six risk prediction models for severe OSA using ML algorithms. Among them, the GBM model performed best. The model facilitates individualized assessment and further clinical strategies for patients with suspected severe OSA. This will help to identify patients with severe OSA as early as possible and ensure their timely treatment. Trial registration Retrospectively registered.

Keywords