IEEE Access (Jan 2022)

On the Generalization of Sleep Apnea Detection Methods Based on Heart Rate Variability and Machine Learning

  • Daniele Padovano,
  • Arturo Martinez-Rodrigo,
  • Jose M. Pastor,
  • Jose J. Rieta,
  • Raul Alcaraz

DOI
https://doi.org/10.1109/ACCESS.2022.3201911
Journal volume & issue
Vol. 10
pp. 92710 – 92725

Abstract

Read online

Obstructive sleep apnea (OSA) is a respiratory disorder highly correlated with severe cardiovascular diseases that has unleashed the interest of hundreds of experts aiming to overcome the elevated requirements of polysomnography, the gold standard for its detection. In this regard, a variety of algorithms based on heart rate variability (HRV) features and machine learning (ML) classifiers have been recently proposed for epoch-wise OSA detection from the surface electrocardiogram signal. Many researchers have employed freely available databases to assess their methods in a reproducible way, but most were purely tested with cross-validation approaches and even some using solely a single database for training and testing procedures. Hence, although promising values of diagnostic accuracy have been reported by some of these methods, they are suspected to be overestimated and the present work aims to analyze the actual generalization ability of several epoch-wise OSA detectors obtained through a common ML pipeline and typical HRV features. Precisely, the performance of the generated OSA detectors has been compared on two validation approaches, i.e., the widely used epoch-wise, $k$ -fold cross-validation and the highly recommended external validation, both considering different combinations of well-known public databases. Regardless of the used ML classifiers and the selected HRV-based features, the external validation results have been 20 to 40% lower than those obtained with cross-validation in terms of accuracy, sensitivity, and specificity. Consequently, these results suggest that ML-based OSA detectors trained with public databases are still not sufficiently general to be employed in clinical practice, as well as that larger, more representative public datasets and the use of external validation are mandatory to improve the generalization ability and to obtain reliable assessment of the true predictive power of these algorithms, respectively.

Keywords