International Journal of Applied Earth Observations and Geoinformation (Mar 2024)

VAM-Net: Vegetation-Attentive deep network for Multi-modal fusion of visible-light and vegetation-sensitive images

  • Yufu Zang,
  • Shuye Wang,
  • Haiyan Guan,
  • Daifeng Peng,
  • Jike Chen,
  • Yanming Chen,
  • Mahmoud R. Delavar

Journal volume & issue
Vol. 127
p. 103642

Abstract

Read online

Multi-modal fusion of remote sensing images poses challenges because of the intricate imaging mechanisms and variations in radiation across different modalities. Specifically, the fusion of visible-light and vegetation-sensitive images encounters similar difficulties. Traditional methods have seldom considered the varied imaging mechanisms and radiation difference between modalities, resulting in discrepancies in the correspond features. To address the issue, we propose the VAM-Net (Vegetation-Attentive Multi-modal deep Network) combining a radiometric correction mechanism and a lightweight multi-modal adaptive feature selection method for fusing multi-modal images. First, the vegetation index (VDVI) is integrated into visible-light images to mitigate the radiometric differences between visible-light images and vegetation-sensitive images (e.g., infrared and red edge images). Then, a two-branch network incorporating attention mechanisms is designed to independently capture the texture features and select similar features cross two different modalities of images. Last, a new loss function is presented to ensure the learned features are suitable for multi-modal fusion. The VAM-Net is evaluated by visible-light and vegetation-sensitive images in three different areas, and the experimental results show that VAM-Net attains an average precision of 67.02%, and recall of 35.49%, and an average RMSE of 2.191px, demonstrating the accuracy and robustness of VAM-Net in multi-modal image fusion.

Keywords