Efficient Water Segmentation with Transformer and Knowledge Distillation for USVs

Jingting Zhang; Jiantao Gao; Jinshuo Liang; Yiqiang Wu; Bin Li; Yang Zhai; Xiaomao Li

doi:10.3390/jmse11050901

Journal of Marine Science and Engineering (Apr 2023)

Efficient Water Segmentation with Transformer and Knowledge Distillation for USVs

Jingting Zhang,
Jiantao Gao,
Jinshuo Liang,
Yiqiang Wu,
Bin Li,
Yang Zhai,
Xiaomao Li

Affiliations

Jingting Zhang: Research Institute of USV Engineering, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
Jiantao Gao: Research Institute of USV Engineering, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
Jinshuo Liang: Research Institute of USV Engineering, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
Yiqiang Wu: Research Institute of USV Engineering, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
Bin Li: National Centre for Archaeology (NACA), Beiiing 100013, China
Yang Zhai: Shanghai Cultural Heritage Conservation and Research Centre, Shanghai 200031, China
Xiaomao Li: Research Institute of USV Engineering, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China

DOI: https://doi.org/10.3390/jmse11050901
Journal volume & issue: Vol. 11, no. 5
p. 901

Abstract

Read online

Water segmentation is a critical task for ensuring the safety of unmanned surface vehicles (USVs). Most existing image-based water segmentation methods may be inaccurate due to light reflection on the water. The fusion-based method combines the paired 2D camera images and 3D LiDAR point clouds as inputs, resulting in a high computational load and considerable time consumption, with limits in terms of practical applications. Thus, in this study, we propose a multimodal fusion water segmentation method that uses a transformer and knowledge distillation to leverage 3D LiDAR point clouds in order to assist in the generation of 2D images. A local and non-local cross-modality fusion module based on a transformer is first used to fuse 2D images and 3D point cloud information during the training phase. A multi-to-single-modality knowledge distillation module is then applied to distill the fused information into a pure 2D network for water segmentation. Extensive experiments were conducted with a dataset containing various scenes collected by USVs in the water. The results demonstrate that the proposed method achieves approximately 1.5% improvement both in accuracy and MaxF over classical image-based methods, and it is much faster than the fusion-based method, achieving speeds ranging from 15 fps to 110 fps.

Published in Journal of Marine Science and Engineering

ISSN: 2077-1312 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Naval Science: Naval architecture. Shipbuilding. Marine engineering; Geography. Anthropology. Recreation: Oceanography
Website: http://www.mdpi.com/journal/jmse

About the journal

Abstract

Keywords