Accuracy Comparison of CNN, LSTM, and Transformer for Activity Recognition Using IMU and Visual Markers

Maria Fernanda Trujillo-Guerrero; Stadyn Roman-Niemes; Milagros Jaen-Vargas; Alfonso Cadiz; Ricardo Fonseca; Jose Javier Serrano-Olmedo

doi:10.1109/ACCESS.2023.3318563

IEEE Access (Jan 2023)

Accuracy Comparison of CNN, LSTM, and Transformer for Activity Recognition Using IMU and Visual Markers

Maria Fernanda Trujillo-Guerrero,
Stadyn Roman-Niemes,
Milagros Jaen-Vargas,
Alfonso Cadiz,
Ricardo Fonseca,
Jose Javier Serrano-Olmedo

Affiliations

Maria Fernanda Trujillo-Guerrero: ORCiD; Center for Biomedical Technology (CTB), Universidad Politécnica de Madrid, Madrid, Spain
Stadyn Roman-Niemes: ORCiD; School of Mathematical and Computational Sciences, Yachay Tech University, Imbabura, Urcuquí, Ecuador
Milagros Jaen-Vargas: ORCiD; Center for Biomedical Technology (CTB), Universidad Politécnica de Madrid, Madrid, Spain
Alfonso Cadiz: Digevo, Santiago, Chile
Ricardo Fonseca: ORCiD; Digevo, Santiago, Chile
Jose Javier Serrano-Olmedo: ORCiD; Center for Biomedical Technology (CTB), Universidad Politécnica de Madrid, Madrid, Spain

DOI: https://doi.org/10.1109/ACCESS.2023.3318563
Journal volume & issue: Vol. 11
pp. 106650 – 106669

Abstract

Read online

Human activity recognition (HAR) has applications ranging from security to healthcare. Typically these systems are composed of data acquisition and activity recognition models. In this work, we compared the accuracy of two acquisition systems: Inertial Measurement Units (IMUs) vs Movement Analysis Systems (MAS). We trained models to recognize arm exercises using state-of-the-art deep learning architectures and compared their accuracy. MAS uses a camera array and reflective markers. IMU uses accelerometers, gyroscopes, and magnetometers. Sensors of both systems were attached to different locations of the upper limb. We captured and annotated 3 datasets, each one using both systems simultaneously. For activity recognition, we trained 8 architectures, each one with different operations and layers configurations. The best architectures were a combination of CNN, LSTM, and Transformer achieving test accuracy from 89% to 99% on average. We evaluated how feature selection reduced the sensors required. We found IMU and MAS data were able to distinguish correctly the arm exercises. CNN layers at the beginning produced better accuracy on challenging datasets. IMU had advantages over other acquisition systems for activity recognition. We analyzed the relations between models accuracy, signal waveforms, signals correlation, sampling rate, exercise duration, and window size. Finally, we proposed the use of a single IMU located at the wrist and a variable-size window extraction.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords