Data in Brief (Aug 2024)
A high-dimensional, multi-transceiver channel state information dataset for enhanced human activity recognition
Abstract
Human Activity Recognition (HAR) has emerged as a critical research area due to its extensive applications in various real-world domains. Numerous CSI-based datasets have been established to support the development and evaluation of advanced HAR algorithms. However, existing CSI-based HAR datasets are frequently limited by a dearth of complexity and diversity in the activities represented, hindering the design of robust HAR models. These limitations typically manifest as a narrow focus on a limited range of activities or the exclusion of factors influencing real-world CSI measurements. Consequently, the scarcity of diverse training data can impede the development of efficient HAR systems. To address the limitations of existing datasets, this paper introduces a novel dataset that captures spatial diversity through multiple transceiver orientations over a high dimensional space encompassing a large number of subcarriers. The dataset incorporates a wider range of real-world factors including extensive activity range, a spectrum of human movements (encompassing both micro-and macro-movements), variations in body composition, and diverse environmental conditions (noise and interference). The experiment is performed in a controlled laboratory environment with dimensions of 5 m (width) × 8 m (length) × 3 m (height) to capture CSI measurements for various human activities. Four ESP32-S3-DevKitC-1 devices, configured as transceiver pairs with unique Media Access Control (MAC) addresses, collect CSI data according to the Wi-Fi IEEE 802.11n standard. Mounted on tripods at a height of 1.5 m, the transmitter devices (powered by external power banks) positioned at north and east send multiple Wi-Fi beacons to their respective receivers (connected to laptops via USB for data collection) located at south and west. To capture multi-perspective CSI data, all six participants sequentially performed designated activities while standing in the centre of the tripod arrangement for 5 s per sample. The system collected approximately 300–450 packets per sample for approximately 1200 samples per activity, capturing CSI information across the 166 subcarriers employed in the Wi-Fi IEEE 802.11n standard. By leveraging the richness of this dataset, HAR researchers can develop more robust and generalizable CSI-based HAR models. Compared to traditional HAR approaches, these CSI-based models hold the promise of significantly enhanced accuracy and robustness when deployed in real-world scenarios. This stems from their ability to capture the nuanced dynamics of human movement through the analysis of wireless channel characteristic from different spatial variations (utilizing two-diagonal ESP32 transceivers configuration) with higher degree of dimensionality (166 subcarriers).