Multi-label remote sensing classification with self-supervised gated multi-modal transformers

Na Liu; Ye Yuan; Guodong Wu; Sai Zhang; Jie Leng; Lihong Wan

doi:10.3389/fncom.2024.1404623

Frontiers in Computational Neuroscience (Sep 2024)

Multi-label remote sensing classification with self-supervised gated multi-modal transformers

Na Liu,
Ye Yuan,
Guodong Wu,
Sai Zhang,
Jie Leng,
Lihong Wan

Affiliations

Na Liu: University of Shanghai for Science and Technology, Institute of Machine Intelligence, Shanghai, China
Ye Yuan: University of Shanghai for Science and Technology, Institute of Machine Intelligence, Shanghai, China
Guodong Wu: Origin Dynamics Intelligent Robot Co., Ltd., Zhengzhou, China
Sai Zhang: Origin Dynamics Intelligent Robot Co., Ltd., Zhengzhou, China
Jie Leng: Origin Dynamics Intelligent Robot Co., Ltd., Zhengzhou, China
Lihong Wan: Origin Dynamics Intelligent Robot Co., Ltd., Zhengzhou, China

DOI: https://doi.org/10.3389/fncom.2024.1404623
Journal volume & issue: Vol. 18

Abstract

Read online

IntroductionWith the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of “pre-training and fine-tuning” paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly.MethodIn order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information.Results and discussionAfter fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

Published in Frontiers in Computational Neuroscience

ISSN: 1662-5188 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/computational_neuroscience

About the journal

Abstract

Keywords