IEEE Access (Jan 2018)

The University of Sussex-Huawei Locomotion and Transportation Dataset for Multimodal Analytics With Mobile Devices

  • Hristijan Gjoreski,
  • Mathias Ciliberto,
  • Lin Wang,
  • Francisco Javier Ordonez Morales,
  • Sami Mekki,
  • Stefan Valentin,
  • Daniel Roggen

DOI
https://doi.org/10.1109/ACCESS.2018.2858933
Journal volume & issue
Vol. 6
pp. 42592 – 42604

Abstract

Read online

Scientific advances build on reproducible researches which need publicly available benchmark data sets. The computer vision and speech recognition communities have led the way in establishing benchmark data sets. There are much less data sets available in mobile computing, especially for rich locomotion and transportation analytics. This paper presents a highly versatile and precisely annotated large-scale data set of smartphone sensor data for multimodal locomotion and transportation analytics of mobile users. The data set comprises seven months of measurements, collected from all sensors of four smartphones carried at typical body locations, including the images of a body-worn camera, while three participants used eight different modes of transportation in the south-east of the U.K., including in London. In total, 28 context labels were annotated, including transportation mode, participant's posture, inside/outside location, road conditions, traffic conditions, presence in tunnels, social interactions, and having meals. The total amount of collected data exceed 950 GB of sensor data, which corresponds to 2812 h of labeled data and 17 562 km of traveled distance. We present how we set up the data collection, including the equipment used and the experimental protocol. We discuss the data set, including the data curation process, the analysis of the annotations, and of the sensor data. We discuss the challenges encountered and present the lessons learned and some of the best practices we developed to ensure high quality data collection and annotation. We discuss the potential applications which can be developed using this large-scale data set. In particular, we present how a machine-learning system can use this data set to automatically recognize modes of transportations. Many other research questions related to transportation analytics, activity recognition, radio signal propagation and mobility modeling can be addressed through this data set. The full data set is being made available to the community, and a thorough preview is already published.

Keywords