MSCAC: A Multi-Scale Swin–CNN Framework for Progressive Remote Sensing Scene Classification

A. Arun Solomon; S. Akila Agnes

doi:10.3390/geographies4030025

Geographies (Jul 2024)

MSCAC: A Multi-Scale Swin–CNN Framework for Progressive Remote Sensing Scene Classification

A. Arun Solomon,
S. Akila Agnes

Affiliations

A. Arun Solomon: Department of Civil Engineering, GMR Institute of Technology, Rajam 532127, India
S. Akila Agnes: Department of Computer Science and Engineering, GMR Institute of Technology, Rajam 532127, India

DOI: https://doi.org/10.3390/geographies4030025
Journal volume & issue: Vol. 4, no. 3
pp. 462 – 480

Abstract

Read online

Recent advancements in deep learning have significantly improved the performance of remote sensing scene classification, a critical task in remote sensing applications. This study presents a new aerial scene classification model, the Multi-Scale Swin–CNN Aerial Classifier (MSCAC), which employs the Swin Transformer, an advanced architecture that has demonstrated exceptional performance in a range of computer vision applications. The Swin Transformer leverages shifted window mechanisms to efficiently model long-range dependencies and local features in images, making it particularly suitable for the complex and varied textures in aerial imagery. The model is designed to capture intricate spatial hierarchies and diverse scene characteristics at multiple scales. A framework is developed that integrates the Swin Transformer with a multi-scale strategy, enabling the extraction of robust features from aerial images of different resolutions and contexts. This approach allows the model to effectively learn from both global structures and fine-grained details, which is crucial for accurate scene classification. The model’s performance is evaluated on several benchmark datasets, including UC-Merced, WHU-RS19, RSSCN7, and AID, where it demonstrates a superior or comparable accuracy to state-of-the-art models. The MSCAC model’s adaptability to varying amounts of training data and its ability to improve with increased data make it a promising tool for real-world remote sensing applications. This study highlights the potential of integrating advanced deep-learning architectures like the Swin Transformer into aerial scene classification, paving the way for more sophisticated and accurate remote sensing systems. The findings suggest that the proposed model has significant potential for various remote sensing applications, including land cover mapping, urban planning, and environmental monitoring.

Published in Geographies

ISSN: 2673-7086 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Geography. Anthropology. Recreation: Geography (General)
Website: https://www.mdpi.com/journal/geographies

About the journal

Abstract

Keywords