Remote Sensing (Aug 2024)

Multimodal Semantic Collaborative Classification for Hyperspectral Images and LiDAR Data

  • Aili Wang,
  • Shiyu Dai,
  • Haibin Wu,
  • Yuji Iwahori

DOI
https://doi.org/10.3390/rs16163082
Journal volume & issue
Vol. 16, no. 16
p. 3082

Abstract

Read online

Although the collaborative use of hyperspectral images (HSIs) and LiDAR data in land cover classification tasks has demonstrated significant importance and potential, several challenges remain. Notably, the heterogeneity in cross-modal information integration presents a major obstacle. Furthermore, most existing research relies heavily on category names, neglecting the rich contextual information from language descriptions. Visual-language pretraining (VLP) has achieved notable success in image recognition within natural domains by using multimodal information to enhance training efficiency and effectiveness. VLP has also shown great potential for land cover classification in remote sensing. This paper introduces a dual-sensor multimodal semantic collaborative classification network (DSMSC2N). It uses large language models (LLMs) in an instruction-driven manner to generate land cover category descriptions enriched with domain-specific knowledge in remote sensing. This approach aims to guide the model to accurately focus on and extract key features. Simultaneously, we integrate and optimize the complementary relationship between HSI and LiDAR data, enhancing the separability of land cover categories and improving classification accuracy. We conduct comprehensive experiments on benchmark datasets like Houston 2013, Trento, and MUUFL Gulfport, validating DSMSC2N’s effectiveness compared to various baseline methods.

Keywords