IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Fine-Grained Urban Village Extraction by Mask Transformer From High-Resolution Satellite Images in Pearl River Delta
Abstract
Urban renewal has led to the proliferation of informal urban habitats, such as slums, shanty towns, and urban villages (UVs). As an important component of urban renewal, UVs influence urban spatial structure and land use patterns. Therefore, the fine extraction of UV is of great theoretical and practical significance. Existing UV classification techniques mostly employ machine learning and convolutional neural network based models, which struggle to perceive long-range global semantic information. In this article, based on high-resolution remote sensing images, we propose a multiscale mask transformer model for UV (MaskUV). It can extract both local texture features and global features. The multiscale mask transformer module with mask attention can aggregate different levels of pixel and object features, enhancing the model's recognition and generalization abilities. We extracted UV in seven cities in the Pearl River Delta (PRD) using MaskUV and analyzed the spatial pattern and accessibility of UV. Due to the scarcity of fine-grained UV detection datasets, we also provide a novel dataset (UVSet) containing 3415 pairs of 512 × 512 high-resolution UV images and labels, with a spatial resolution of 1 m. Comparative experiments with several UV extraction models demonstrate the effectiveness of MaskUV, achieving an F1 score of 84.39% and an IoU of 73.00% on UVSet. Besides, MaskUV achieves highly accurate detection results in seven cities in the PRD, with average F1 and IoU values of 84.41% and 72.44%, respectively.
Keywords