IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Burned Area Segmentation in Optical Remote Sensing Images Driven by U-Shaped Multistage Masked Autoencoder
Abstract
Computer vision (CV) for natural disaster monitoring from optical remote sensing images (ORSIs) has been an emerging topic in analyzing ORSIs. Recently masked autoencoder (MAE) has achieved great success in CV and shown promising potential for many downstream vision tasks. However, due to the inherent limitation of vision transformer (ViT) in MAE which has a fixed feature scale and performs poorly in modeling local spatial correlation, directly applying MAE to burned area segmentation (BAS) in ORSIs fails to achieve satisfactory results. To address this problem, we propose a novel dual-branch complement network (DCNet) driven by U-shaped multistage masked autoencoder (UMMAE) for BAS in ORSIs, which is also the first application of MAE in BAS. UMMAE has four stages and introduces skip connection between the encoder and decoder at the same stage, which improves the feature diversity and further enhances the model performance. DCNet has three major components: the ViT encoder (global branch), the convolution encoder (local branch), and the decoder. The global branch inherits visual representation learning ability from the pretrained UMMAE and captures global contextual information from the input image, while the local branch extracts local spatial information at different scales. Features from two different branches are fused in the decoder for feature complementation, which improves feature discriminability and segmentation accuracy. Besides, we build a new BAS dataset containing ORSIs of burned area in California, USA, from 2017 to 2022. Extensive experiments on two BAS datasets demonstrate that our DCNet outperforms the state-of-the-art methods.
Keywords