Self‐supervised multimodal fusion transformer for passive activity recognition

Armand K. Koupai; Mohammud J. Bocus; Raul Santos‐Rodriguez; Robert J. Piechocki; Ryan McConville

doi:10.1049/wss2.12044

IET Wireless Sensor Systems (Oct 2022)

Self‐supervised multimodal fusion transformer for passive activity recognition

Armand K. Koupai,
Mohammud J. Bocus,
Raul Santos‐Rodriguez,
Robert J. Piechocki,
Ryan McConville

Affiliations

Armand K. Koupai: School of Computer Science Electrical and Electronic Engineering, and Engineering Maths University of Bristol Bristol UK
Mohammud J. Bocus: School of Computer Science Electrical and Electronic Engineering, and Engineering Maths University of Bristol Bristol UK
Raul Santos‐Rodriguez: School of Computer Science Electrical and Electronic Engineering, and Engineering Maths University of Bristol Bristol UK
Robert J. Piechocki: School of Computer Science Electrical and Electronic Engineering, and Engineering Maths University of Bristol Bristol UK
Ryan McConville: School of Computer Science Electrical and Electronic Engineering, and Engineering Maths University of Bristol Bristol UK

DOI: https://doi.org/10.1049/wss2.12044
Journal volume & issue: Vol. 12, no. 5-6
pp. 149 – 160

Abstract

Read online

Abstract The pervasiveness of Wi‐Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi‐Fi sensing are based on passive Wi‐Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this study, new properties of the Transformer architecture for multimodal sensor fusion are explored. Different signal processing techniques are used to extract multiple image‐based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). The Fusion Transformer, an attention‐based model for multimodal and multi‐sensor fusion is first proposed. Experimental results show that the Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve the model, a simple and effective framework for multimodal and multi‐sensor self‐supervised learning (SSL) is proposed. The self‐supervised Fusion Transformer outperforms the baselines, achieving a macro F1‐score of 95.9%. Finally, this study shows how this approach significantly outperforms the others when trained with as little as 1% (2 min) of labelled training data to 20% (40 min) of labelled training data.

Published in IET Wireless Sensor Systems

ISSN: 2043-6386 (Print); 2043-6394 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: https://ietresearch.onlinelibrary.wiley.com/journal/20436394

About the journal

Abstract

Keywords