Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification

Weili Kong; Baisen Liu; Xiaojun Bi; Jiaming Pei; Zheng Chen

doi:10.1109/JSTARS.2023.3337132

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Instructional Mask Autoencoder: A Scalable Learner for Hyperspectral Image Classification

Weili Kong,
Baisen Liu,
Xiaojun Bi,
Jiaming Pei,
Zheng Chen

Affiliations

Weili Kong: ORCiD; School of Information and Communication Engineering, Harbin Engineering University, Harbin, China
Baisen Liu: ORCiD; School of Electronic and Information Engineering, Heilongjiang Institute of Technology, Harbin, China
Xiaojun Bi: ORCiD; School of Information Engineering, Minzu University of China, Beijing, China
Jiaming Pei: ORCiD; School of Computer Science, The University of Sydney, Sydney, NSW, Australia
Zheng Chen: ORCiD; Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, China

DOI: https://doi.org/10.1109/JSTARS.2023.3337132
Journal volume & issue: Vol. 17
pp. 1348 – 1362

Abstract

Read online

Nowadays, an increasing number of hyperspectral images (HSIs) are becoming available. However, the utilization of unlabeled HSIs is extremely low due to high annotation costs. Thus, it is crucial to figure out how to use these unlabeled HSIs and enhance the classification performance. Fortunately, self-supervised training enables us to acquire latent features from unlabeled HSIs, thereby enhancing network performance via transfer learning. Whereas, most current networks for HSIs are inflexible, it is challenging for them to perform learning and accommodate multimodal HSIs. Therefore, we devise a scalable self-supervised network called instructional mask autoencoder, which can extract general patterns of HSIs by these unannotated data. It primarily consists of a spatial–spectral embedding block and a transformer-based masked autoencoder, which are utilized for projecting input samples into the same latent space and learning higher level semantic information, respectively. Moreover, we utilize a random token called $ins\_{t}oken$ to instruct the model learn components of global information, which are highly correlated with the target pixel in HSI samples. In the fine-tuning stage, we design a learnable aggregation mechanism to put all tokens into full play. The obtained results illustrate that our method exhibits robust generalization performance and accelerates convergence across diverse datasets. In cases of limited samples, we conducted experiments on three structurally distinct HSIs, all of which achieved competitive performance. Compared to state-of-the-art methods, our approach demonstrated respective improvements of 1.97%, 0.44%, and 3.35% on these three datasets.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords