Applied Sciences (Oct 2023)

Two-Stage Fusion-Based Audiovisual Remote Sensing Scene Classification

  • Yaming Wang,
  • Yiyang Liu,
  • Wenqing Huang,
  • Xiaoping Ye,
  • Mingfeng Jiang

DOI
https://doi.org/10.3390/app132111890
Journal volume & issue
Vol. 13, no. 21
p. 11890

Abstract

Read online

Scene classification in remote sensing is a pivotal research area, traditionally relying on visual information from aerial images for labeling. The introduction of ground environment audio as a novel geospatial data source adds valuable information for scene classification. However, bridging the structural gap between aerial images and ground environment audio is challenging, rendering popular two-branch networks ineffective for direct data fusion. To address this issue, the study in this research presents the Two-stage Fusion-based Audiovisual Classification Network (TFAVCNet). TFAVCNet leverages both audio and visual modules to extract deep semantic features from ground environmental audio and remote sensing images, respectively. The audiovisual fusion module combines and fuses information from both modalities at the feature and decision levels, facilitating joint training and yielding a more-robust solution. The proposed method outperforms existing approaches, as demonstrated by the experimental results on the ADVANCE dataset for remote sensing audiovisual scene classification, offering an innovative approach to enhanced scene classification.

Keywords