Frontiers in Digital Health (Nov 2021)

Reproducible Analysis Pipeline for Data Streams: Open-Source Software to Process Data Collected With Mobile Devices

  • Julio Vega,
  • Meng Li,
  • Kwesi Aguillera,
  • Nikunj Goel,
  • Echhit Joshi,
  • Kirtiraj Khandekar,
  • Krina C. Durica,
  • Abhineeth R. Kunta,
  • Carissa A. Low

DOI
https://doi.org/10.3389/fdgth.2021.769823
Journal volume & issue
Vol. 3

Abstract

Read online

Smartphone and wearable devices are widely used in behavioral and clinical research to collect longitudinal data that, along with ground truth data, are used to create models of human behavior. Mobile sensing researchers often program data processing and analysis code from scratch even though many research teams collect data from similar mobile sensors, platforms, and devices. This leads to significant inefficiency in not being able to replicate and build on others' work, inconsistency in quality of code and results, and lack of transparency when code is not shared alongside publications. We provide an overview of Reproducible Analysis Pipeline for Data Streams (RAPIDS), a reproducible pipeline to standardize the preprocessing, feature extraction, analysis, visualization, and reporting of data streams coming from mobile sensors. RAPIDS is formed by a group of R and Python scripts that are executed on top of reproducible virtual environments, orchestrated by a workflow management system, and organized following a consistent file structure for data science projects. We share open source, documented, extensible and tested code to preprocess, extract, and visualize behavioral features from data collected with any Android or iOS smartphone sensing app as well as Fitbit and Empatica wearable devices. RAPIDS allows researchers to process mobile sensor data in a rigorous and reproducible way. This saves time and effort during the data analysis phase of a project and facilitates sharing analysis workflows alongside publications.

Keywords