IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

BIHAF-Net: Bilateral Interactive Hierarchical Adaptive Fusion Network for Collaborative Classification of Hyperspectral and LiDAR Data

  • Yunji Zhao,
  • Wenming Bao,
  • Jun Xu,
  • Xiaozhuo Xu

DOI
https://doi.org/10.1109/JSTARS.2024.3453936
Journal volume & issue
Vol. 17
pp. 15971 – 15988

Abstract

Read online

Multimodal remote sensing data can portray land-cover characteristics more comprehensively. Deep learning has powerful feature extraction capability. Therefore, deep learning-based methods have been widely used for collaborative classification of hyperspectral image (HSI) and light detection and ranging (LiDAR) data and achieve competitive classification performance. However, existing methods either overlook complementary information between multimodal data during feature extraction or overly highlight features of one modality during multimodal feature interaction. In addition, some methods integrate the extracted multimodal features using a straightforward cross-attention mechanism, it is difficult to adequately emphasize the relative importance of multimodal features and tends to lose the local detail information of intramodal features. Therefore, this article proposes a bilateral interactive hierarchical adaptive fusion network (BIHAF-Net) for collaborative classification of HSI and LiDAR data. First, the proposed model adopts a two-branch structure, where each branch sequentially connects multilevel convolutional neural network feature extractor and spectral–spatial transformer, which are used to mine discriminative high-level semantic information from HSI and LiDAR data, respectively. Second, the bilateral interactive feedback module is designed to enhance the spatial feature representation ability of HSI information and the spectral feature representation ability of LiDAR information. Finally, a cross-modal hierarchical adaptive fusion module is developed to dynamically fuse the extracted multimodal features, which not only highlights the relative importance of the multimodal features, but also preserves local detail information of intramodal. Experiment is conducted on four benchmark HSI and LiDAR datasets, and the experimental results demonstrate the proposed BIHAF-Net performs better classification performance.

Keywords