Comparison of a Deep Learning-Based Pose Estimation System to Marker-Based and Kinect Systems in Exergaming for Balance Training

Elise Klæbo Vonstad; Xiaomeng Su; Beatrix Vereijken; Kerstin Bach; Jan Harald Nilsen

doi:10.3390/s20236940

Sensors (Dec 2020)

Comparison of a Deep Learning-Based Pose Estimation System to Marker-Based and Kinect Systems in Exergaming for Balance Training

Elise Klæbo Vonstad,
Xiaomeng Su,
Beatrix Vereijken,
Kerstin Bach,
Jan Harald Nilsen

Affiliations

Elise Klæbo Vonstad: Department of Computer Science, Norwegian University of Science and Technology, 7034 Trondheim, Norway
Xiaomeng Su: Department of Computer Science, Norwegian University of Science and Technology, 7034 Trondheim, Norway
Beatrix Vereijken: Department of Neuromedicine and Movement Science, Norwegian University of Science and Technology, 7030 Trondheim, Norway
Kerstin Bach: Department of Computer Science, Norwegian University of Science and Technology, 7034 Trondheim, Norway
Jan Harald Nilsen: Department of Computer Science, Norwegian University of Science and Technology, 7034 Trondheim, Norway

DOI: https://doi.org/10.3390/s20236940
Journal volume & issue: Vol. 20, no. 23
p. 6940

Abstract

Read online

Using standard digital cameras in combination with deep learning (DL) for pose estimation is promising for the in-home and independent use of exercise games (exergames). We need to investigate to what extent such DL-based systems can provide satisfying accuracy on exergame relevant measures. Our study assesses temporal variation (i.e., variability) in body segment lengths, while using a Deep Learning image processing tool (DeepLabCut, DLC) on two-dimensional (2D) video. This variability is then compared with a gold-standard, marker-based three-dimensional Motion Capturing system (3DMoCap, Qualisys AB), and a 3D RGB-depth camera system (Kinect V2, Microsoft Inc). Simultaneous data were collected from all three systems, while participants (N = 12) played a custom balance training exergame. The pose estimation DLC-model is pre-trained on a large-scale dataset (ImageNet) and optimized with context-specific pose annotated images. Wilcoxon’s signed-rank test was performed in order to assess the statistical significance of the differences in variability between systems. The results showed that the DLC method performs comparably to the Kinect and, in some segments, even to the 3DMoCap gold standard system with regard to variability. These results are promising for making exergames more accessible and easier to use, thereby increasing their availability for in-home exercise.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords