Discover Data (Jan 2025)

A novel multivariate time series dataset of outdoor sport activities

  • Matarmaa Jarno

DOI
https://doi.org/10.1007/s44248-025-00019-5
Journal volume & issue
Vol. 3, no. 1
pp. 1 – 11

Abstract

Read online

Abstract This study introduces a novel multivariate time series dataset of 228 outdoor sport activities recorded by individual non-competitive athlete in uncontrolled environments. The dataset includes three features: Heart Rate, Speed, and Altitude, and covers five sport categories: walking, running, skiing, roller-skiing, and biking. The data was collected using two types of Garmin sport watches. The original dataset was carefully pre-processed using typical data cleansing methods such as gaps filling, and value format transformations. Furthermore, activity filtering was implemented for missing sensor value data and using domain knowledge of sport categories. Full length sequences, varying from 10 min to several hours, were split into equal length segments, approximately 1 min. To address the small number of instances data was augmented using several consecutive segments from the same activity. However, only a small part of the whole original data was used as a computational cost–information gain tradeoff. Three-dimensional dataset is divided into three parts, each dimension to its own comma separated value (CSV) file. The dataset aims to provide a unique resource for researchers and practitioners in the field of sports science, human performance analysis, and activity recognition. It aims to complement the very limited or non-existent publicly available sport activity datasets.

Keywords