Point Cloud Hand–Object Segmentation Using Multimodal Imaging with Thermal and Color Data for Safe Robotic Object Handover

Yan Zhang; Steffen Müller; Benedict Stephan; Horst-Michael Gross; Gunther Notni

doi:10.3390/s21165676

Sensors (Aug 2021)

Point Cloud Hand–Object Segmentation Using Multimodal Imaging with Thermal and Color Data for Safe Robotic Object Handover

Yan Zhang,
Steffen Müller,
Benedict Stephan,
Horst-Michael Gross,
Gunther Notni

Affiliations

Yan Zhang: Group for Quality Assurance and Industrial Image Processing, Technische Universität Ilmenau, 98693 Ilmenau, Germany
Steffen Müller: Neuroinformatics and Cognitive Robotics Lab, Technische Universität Ilmenau, 98693 Ilmenau, Germany
Benedict Stephan: Neuroinformatics and Cognitive Robotics Lab, Technische Universität Ilmenau, 98693 Ilmenau, Germany
Horst-Michael Gross: Neuroinformatics and Cognitive Robotics Lab, Technische Universität Ilmenau, 98693 Ilmenau, Germany
Gunther Notni: Group for Quality Assurance and Industrial Image Processing, Technische Universität Ilmenau, 98693 Ilmenau, Germany

DOI: https://doi.org/10.3390/s21165676
Journal volume & issue: Vol. 21, no. 16
p. 5676

Abstract

Read online

This paper presents an application of neural networks operating on multimodal 3D data (3D point cloud, RGB, thermal) to effectively and precisely segment human hands and objects held in hand to realize a safe human–robot object handover. We discuss the problems encountered in building a multimodal sensor system, while the focus is on the calibration and alignment of a set of cameras including RGB, thermal, and NIR cameras. We propose the use of a copper–plastic chessboard calibration target with an internal active light source (near-infrared and visible light). By brief heating, the calibration target could be simultaneously and legibly captured by all cameras. Based on the multimodal dataset captured by our sensor system, PointNet, PointNet++, and RandLA-Net are utilized to verify the effectiveness of applying multimodal point cloud data for hand–object segmentation. These networks were trained on various data modes (XYZ, XYZ-T, XYZ-RGB, and XYZ-RGB-T). The experimental results show a significant improvement in the segmentation performance of XYZ-RGB-T (mean Intersection over Union: 82.8% by RandLA-Net) compared with the other three modes (77.3% by XYZ-RGB, 35.7% by XYZ-T, 35.7% by XYZ), in which it is worth mentioning that the Intersection over Union for the single class of hand achieves 92.6%.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords