Burned Area Segmentation in Optical Remote Sensing Images Driven by U-Shaped Multistage Masked Autoencoder

Yuxiang Fu; Wei Fang; Victor S. Sheng

doi:10.1109/JSTARS.2024.3402122

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Burned Area Segmentation in Optical Remote Sensing Images Driven by U-Shaped Multistage Masked Autoencoder

Yuxiang Fu,
Wei Fang,
Victor S. Sheng

Affiliations

Yuxiang Fu: ORCiD; School of Computer Science, Nanjing University of Information Science and Technology, Nanjing, China
Wei Fang: ORCiD; School of Computer Science, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, China
Victor S. Sheng: ORCiD; Department of Computer Science, Texas Tech University, Lubbock, TX, USA

DOI: https://doi.org/10.1109/JSTARS.2024.3402122
Journal volume & issue: Vol. 17
pp. 10770 – 10780

Abstract

Read online

Computer vision (CV) for natural disaster monitoring from optical remote sensing images (ORSIs) has been an emerging topic in analyzing ORSIs. Recently masked autoencoder (MAE) has achieved great success in CV and shown promising potential for many downstream vision tasks. However, due to the inherent limitation of vision transformer (ViT) in MAE which has a fixed feature scale and performs poorly in modeling local spatial correlation, directly applying MAE to burned area segmentation (BAS) in ORSIs fails to achieve satisfactory results. To address this problem, we propose a novel dual-branch complement network (DCNet) driven by U-shaped multistage masked autoencoder (UMMAE) for BAS in ORSIs, which is also the first application of MAE in BAS. UMMAE has four stages and introduces skip connection between the encoder and decoder at the same stage, which improves the feature diversity and further enhances the model performance. DCNet has three major components: the ViT encoder (global branch), the convolution encoder (local branch), and the decoder. The global branch inherits visual representation learning ability from the pretrained UMMAE and captures global contextual information from the input image, while the local branch extracts local spatial information at different scales. Features from two different branches are fused in the decoder for feature complementation, which improves feature discriminability and segmentation accuracy. Besides, we build a new BAS dataset containing ORSIs of burned area in California, USA, from 2017 to 2022. Extensive experiments on two BAS datasets demonstrate that our DCNet outperforms the state-of-the-art methods.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords