Scientific Data (Nov 2024)

A benchmark for domain adaptation and generalization in smartphone-based human activity recognition

  • Otávio Napoli,
  • Dami Duarte,
  • Patrick Alves,
  • Darlinne Hubert Palo Soto,
  • Henrique Evangelista de Oliveira,
  • Anderson Rocha,
  • Levy Boccato,
  • Edson Borin

DOI
https://doi.org/10.1038/s41597-024-03951-4
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Human activity recognition (HAR) using smartphone inertial sensors, like accelerometers and gyroscopes, enhances smartphones’ adaptability and user experience. Data distribution from these sensors is affected by several factors including sensor hardware, software, device placement, user demographics, terrain, and more. Most datasets focus on providing variability in user and (sometimes) device placement, limiting domain adaptation and generalization studies. Consequently, models trained on one dataset often perform poorly on others. Despite many publicly available HAR datasets, cross-dataset generalization remains challenging due to data format incompatibilities, such as differences in measurement units, sampling rates, and label encoding. Hence, we introduce the DAGHAR benchmark, a curated collection of datasets for domain adaptation and generalization studies in smartphone-based HAR. We standardized six datasets in terms of accelerometer units, sampling rate, gravity component, activity labels, user partitioning, and time window size, removing trivial biases while preserving intrinsic differences. This enables controlled evaluation of model generalization capabilities. Additionally, we provide baseline performance metrics from state-of-the-art machine learning models, crucial for comprehensive evaluations of generalization in HAR tasks.