A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation

Wei Xiong; Zhenyu Xiong; Yaqi Cui

doi:10.1109/JSTARS.2022.3189052

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2022)

A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation

Wei Xiong,
Zhenyu Xiong,
Yaqi Cui

Affiliations

Wei Xiong: Research Institute of Information Fusion, Naval Aviation University, Yantai, China
Zhenyu Xiong: ORCiD; Research Institute of Information Fusion, Naval Aviation University, Yantai, China
Yaqi Cui: Research Institute of Information Fusion, Naval Aviation University, Yantai, China

DOI: https://doi.org/10.1109/JSTARS.2022.3189052
Journal volume & issue: Vol. 15
pp. 5440 – 5454

Abstract

Read online

The increasing number and complex content of aerial images have made some recent methods based on deep learning not fit well with different aerial image processing tasks. The coarse-grained feature representation proposed by these methods is not discriminative enough. Besides, the confounding factors in the datasets and long-tailed distribution of the training data will lead to biased and spurious associations among the objects of aerial images. This study proposes a confounder-free fusion network (CFF-NET) to address the challenges. Global and local feature extraction branches are designed to capture comprehensive and fine-grained deep features from the whole image. Specifically, to extract the discriminative local feature and explore the contextual information across different regions, the models based on gated recurrent units are constructed to extract features of the image region and output the important weight of each region. Furthermore, the confounder-free object feature extraction branch is proposed to generate reasonable visual attention and provide more multigrained image information. It also eliminates the spurious and biased visual relationships of the image on the object level. Finally, the output of the three branches is combined to obtain the fusion feature representation. Extensive experiments are conducted on the three popular aerial image processing tasks: 1) image classification, 2) image retrieval, and 3) image captioning. It is found that the proposed CFF-NET achieves reasonable and state-of-the-art results, including high-level tasks such as aerial image captioning.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords