IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)
Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss
Abstract
Zero-shot remote sensing scene classification refers to making the model to have the ability to identify the unseen class scenes based on seen class scenes, and has become a research hotspot in the field of remote sensing. Contemporary approaches in zero-shot remote sensing scene classification primarily focus on extracting global information from scenes, neglecting nuanced local landscape features. This oversight diminishes the discriminative capabilities of recognition models. Furthermore, these methods overlook the semantic relevance between seen and unseen class scenes in training, leading to reduced emphasis on learning from varied scenes and subsequent declines in classification performance. To address these challenges, this article proposes the “Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss (LGFFWM).” The design incorporates a local-global feature fusion (LGFF) module enabling adaptive labeling and feature modeling of internal local landscapes, effectively merging them with global features for a more discriminative representation of remote sensing scenes. Furthermore, a weight mapping loss (WM Loss) function is introduced, leveraging a semantic correlation matrix to compel the model to prioritize learning seen class scenes that exhibit strong correlations with unseen class scenes by assigning higher training weights. Extensive experiments have been conducted on classical remote sensing scene datasets, including UCM, AID, and NWPU, demonstrate the superiority of the proposed LGFFWM method over ten advanced comparative methods, yielding overall accuracy improvements of over 2.25%, 3.47%, and 0.44%, respectively. Additional experiments on the SIRI-WHU and RSSCN7 datasets underscore the transferability of LGFFWM, achieving overall accuracies of 53.50% and 47.37%, respectively.
Keywords