Axial Cross Attention Meets CNN: Bibranch Fusion Network for Change Detection

Lei Song; Min Xia; Liguo Weng; Haifeng Lin; Ming Qian; Binyu Chen

doi:10.1109/JSTARS.2022.3224081

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)

Axial Cross Attention Meets CNN: Bibranch Fusion Network for Change Detection

Lei Song,
Min Xia,
Liguo Weng,
Haifeng Lin,
Ming Qian,
Binyu Chen

Affiliations

Lei Song: ORCiD; Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing, China
Min Xia: ORCiD; Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing, China
Liguo Weng: ORCiD; Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing, China
Haifeng Lin: ORCiD; College of Information Science and Technology, Nanjing Forestry University, Nanjing, China
Ming Qian: ORCiD; State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
Binyu Chen: Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing, China

DOI: https://doi.org/10.1109/JSTARS.2022.3224081
Journal volume & issue: Vol. 16
pp. 21 – 32

Abstract

Read online

In the previous years, vision transformer has demonstrated a global information extraction capability in the field of computer vision that convolutional neural network (CNN) lacks. Due to the lack of inductive bias in vision transformer, it requires a large amount of data to support its training. In the field of remote sensing, it costs a lot to obtain a significant number of high-resolution remote sensing images. Most existing change detection networks based on deep learning rely heavily on the CNN, which cannot effectively utilize the long-distance dependence between pixels for difference discrimination. Therefore, this work aims to use a high-performance vision transformer to conduct change detection research with limited data. A bibranch fusion network based on axial cross attention (ACABFNet) is proposed. The network extracts local and global information of images through the CNN branch and transformer branch, respectively, and then, fuses local and global features by the bidirectional fusion approach. In the upsampling stage, similar feature information and difference feature information of the two branches are explicitly generated by feature addition and feature subtraction. Considering that the self-attention mechanism is not efficient enough for global attention over small datasets, we propose the axial cross attention. First, global attention along the height and width dimensions of images is performed respectively, and then cross attention is used to fuse the global feature information along two dimensions. Compared with the original self-attention, the structure is more graphics processing unit friendly and efficient. Experimental results on three datasets reveal that the ACABFNet outperforms existing change detection algorithms.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords