A Novel Network-Level Fusion Architecture of Proposed Self-Attention and Vision Transformer Models for Land Use and Land Cover Classification From Remote Sensing Images

Saddaf Rubab; Muhammad Attique Khan; Ameer Hamza; Hussain Mobarak Albarakati; Oumaima Saidani; Amal Alshardan; Areej Alasiry; Mehrez Marzougui; Yunyoung Nam

doi:10.1109/JSTARS.2024.3426950

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

A Novel Network-Level Fusion Architecture of Proposed Self-Attention and Vision Transformer Models for Land Use and Land Cover Classification From Remote Sensing Images

Saddaf Rubab,
Muhammad Attique Khan,
Ameer Hamza,
Hussain Mobarak Albarakati,
Oumaima Saidani,
Amal Alshardan,
Areej Alasiry,
Mehrez Marzougui,
Yunyoung Nam

Affiliations

Saddaf Rubab: ORCiD; Department of Computer Engineering, College of Computing and Informatics, University of Sharjah, Sharjah, UAE
Muhammad Attique Khan: ORCiD; Department of Computer Science, HITEC University, Taxila, Pakistan
Ameer Hamza: ORCiD; Department of Computer Science, HITEC University, Taxila, Pakistan
Hussain Mobarak Albarakati: ORCiD; Department of Computer and Network Engineering, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
Oumaima Saidani: ORCiD; Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
Amal Alshardan: ORCiD; Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
Areej Alasiry: ORCiD; College of Computer Science, King Khalid University, Abha, Saudi Arabia
Mehrez Marzougui: ORCiD; College of Computer Science, King Khalid University, Abha, Saudi Arabia
Yunyoung Nam: ORCiD; Department of ICT Convergence, Soonchunhyang University, Asan, South Korea

DOI: https://doi.org/10.1109/JSTARS.2024.3426950
Journal volume & issue: Vol. 17
pp. 13135 – 13148

Abstract

Read online

Convolutional neural networks (CNNs), in particular, demonstrate the remarkable power of feature learning in remote sensing for land use and cover classification, as demonstrated by recent deep learning techniques driven by vast amounts of data. In this work, we proposed a new network-level fusion deep architecture based on 16-tiny Vision Transformer and SIBNet. In the initial phase, data augmentation has been performed to resolve the problem of data imbalances. In the next step, we proposed a self-attention bottleneck-based inception CNN network named SIBNet. In this network, two architectures are followed. The blocks are designed using inception architecture, and each inception module is created with bottleneck blocks. The 16-tiny vision transformer architecture has been implemented for RS images and fused using a network-level fusion with SIBNet for the first time. Hyperparameters of the proposed model have been initialized using Bayesian Optimization for better training on the RS images. After the fusion, the model was on RS image datasets and extracted deep features from the self-attention layer. The extracted features are classified using a neural network classifier with multiple hidden layers. The experimental process of the proposed architecture has been performed on two publically available datasets, such as EuroSAT and NWPU, and obtained an accuracy of 97.8 and 98.9%, respectively. A detailed ablation study has been performed to test the proposed models and shows that the fusion model achieved improved accuracy. In addition, a comparison is conducted with recent techniques and proposed methods, showing improved precision, recall, and accuracy.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords