JMIR mHealth and uHealth (Nov 2023)

Accuracy of 11 Wearable, Nearable, and Airable Consumer Sleep Trackers: Prospective Multicenter Validation Study

  • Taeyoung Lee,
  • Younghoon Cho,
  • Kwang Su Cha,
  • Jinhwan Jung,
  • Jungim Cho,
  • Hyunggug Kim,
  • Daewoo Kim,
  • Joonki Hong,
  • Dongheon Lee,
  • Moonsik Keum,
  • Clete A Kushida,
  • In-Young Yoon,
  • Jeong-Whun Kim

DOI
https://doi.org/10.2196/50983
Journal volume & issue
Vol. 11
p. e50983

Abstract

Read online

BackgroundConsumer sleep trackers (CSTs) have gained significant popularity because they enable individuals to conveniently monitor and analyze their sleep. However, limited studies have comprehensively validated the performance of widely used CSTs. Our study therefore investigated popular CSTs based on various biosignals and algorithms by assessing the agreement with polysomnography. ObjectiveThis study aimed to validate the accuracy of various types of CSTs through a comparison with in-lab polysomnography. Additionally, by including widely used CSTs and conducting a multicenter study with a large sample size, this study seeks to provide comprehensive insights into the performance and applicability of these CSTs for sleep monitoring in a hospital environment. MethodsThe study analyzed 11 commercially available CSTs, including 5 wearables (Google Pixel Watch, Galaxy Watch 5, Fitbit Sense 2, Apple Watch 8, and Oura Ring 3), 3 nearables (Withings Sleep Tracking Mat, Google Nest Hub 2, and Amazon Halo Rise), and 3 airables (SleepRoutine, SleepScore, and Pillow). The 11 CSTs were divided into 2 groups, ensuring maximum inclusion while avoiding interference between the CSTs within each group. Each group (comprising 8 CSTs) was also compared via polysomnography. ResultsThe study enrolled 75 participants from a tertiary hospital and a primary sleep-specialized clinic in Korea. Across the 2 centers, we collected a total of 3890 hours of sleep sessions based on 11 CSTs, along with 543 hours of polysomnography recordings. Each CST sleep recording covered an average of 353 hours. We analyzed a total of 349,114 epochs from the 11 CSTs compared with polysomnography, where epoch-by-epoch agreement in sleep stage classification showed substantial performance variation. More specifically, the highest macro F1 score was 0.69, while the lowest macro F1 score was 0.26. Various sleep trackers exhibited diverse performances across sleep stages, with SleepRoutine excelling in the wake and rapid eye movement stages, and wearables like Google Pixel Watch and Fitbit Sense 2 showing superiority in the deep stage. There was a distinct trend in sleep measure estimation according to the type of device. Wearables showed high proportional bias in sleep efficiency, while nearables exhibited high proportional bias in sleep latency. Subgroup analyses of sleep trackers revealed variations in macro F1 scores based on factors, such as BMI, sleep efficiency, and apnea-hypopnea index, while the differences between male and female subgroups were minimal. ConclusionsOur study showed that among the 11 CSTs examined, specific CSTs showed substantial agreement with polysomnography, indicating their potential application in sleep monitoring, while other CSTs were partially consistent with polysomnography. This study offers insights into the strengths of CSTs within the 3 different classes for individuals interested in wellness who wish to understand and proactively manage their own sleep.