Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes

Marco Altmann; Peter Ott; Nicolaj C. Stache; Christian Waldschmidt

doi:10.1109/ACCESS.2021.3056878

IEEE Access (Jan 2021)

Multi-Modal Cross Learning for an FMCW Radar Assisted by Thermal and RGB Cameras to Monitor Gestures and Cooking Processes

Marco Altmann,
Peter Ott,
Nicolaj C. Stache,
Christian Waldschmidt

Affiliations

Marco Altmann: ORCiD; Institute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, Germany
Peter Ott: ORCiD; Institute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, Germany
Nicolaj C. Stache: ORCiD; Institute of Automotive Engineering and Mechatronics, Heilbronn University of Applied Sciences, Heilbronn, Germany
Christian Waldschmidt: ORCiD; Institute of Microwave Engineering, Ulm University, Ulm, Germany

DOI: https://doi.org/10.1109/ACCESS.2021.3056878
Journal volume & issue: Vol. 9
pp. 22295 – 22303

Abstract

Read online

This paper proposes a multi-modal cross learning approach to augment the neural network training phase by additional sensor data. The approach is multi-modal during training (i.e., radar Range-Doppler maps, thermal camera images, and RGB camera images are used for training). In inference, the approach is single-modal (i.e., only radar Range-Doppler maps are needed for classification). The proposed approach uses a multi-modal autoencoder training which creates a compressed data representation containing correlated features across modalities. The encoder part is then used as a pretrained network for the classification task. The benefits are that expensive sensors like high resolution thermal cameras are not needed in the application but a higher classification accuracy is achieved because of the multi-modal cross learning during training. The autoencoders can also be used to generate hallucinated data of the absent sensors. The hallucinated data can be used for user interfaces, a further classification, or other tasks. The proposed approach is verified within a simultaneous cooking process classification, 2 × 2 cooktop occupancy detection, and gesture recognition task. The main functionality is an overboil protection and gesture control of a 2 × 2 cooktop. The multi-modal cross learning approach considerably outperforms single-modal approaches on that challenging classification task.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords