International Journal of Applied Earth Observations and Geoinformation (Mar 2024)

Multimodal deep fusion model based on Transformer and multi-layer residuals for assessing the competitiveness of weeds in farmland ecosystems

  • Zhaoxia Lou,
  • Longzhe Quan,
  • Deng Sun,
  • Fulin Xia,
  • Hailong Li,
  • Zhiming Guo

Journal volume & issue
Vol. 127
p. 103681

Abstract

Read online

Weed competitiveness monitoring is crucial for field management at specific locations. Recent research in the fusion of multimodal data from unmanned aerial vehicles (UAVs) has propelled this advancement. However, these studies merely stack extracted features equivalently, neglecting the full utilization of fused information. This study utilizes hyperspectral and LiDAR data collected by UAVs to proposes a multimodal deep fusion model (MulDFNet) using Transformer and multi-layer residuals. It utilizes a comprehensive competitive index (CCI-A) based on multidimensional phenotypes of maize to assess the competitiveness of weeds in farmland ecosystems. To validate the effectiveness of this model, a series of ablation studies were conducted involving different modalities data, with/without the Transformer Encoder (TE) modules, and different fusion modules (shallow residual fusion module, deep feature fusion module). Additionally, a comparison was made with early/late stacking fusion models, traditional machine learning models, and deep learning models from relevant studies. The results indicate that the multimodal deep fusion model utilizing HSI, VI, and CHM data achieved a predictive effect of R2 = 0.903 (RMSE = 0.078). Notably, the best performance was observed during the five-leaf stage. The combination of shallow and deep fusion modules demonstrated better predictive performance compared to a single fusion module. The positive impact of the TE module on model performance is evident, as its multi-head attention mechanism aids in better capturing the relationships and importance between feature maps and competition indices, thereby enhancing the model's predictive capability. In weed competition prediction, the multimodal deep fusion model proposed in this study has demonstrated significantly better predictive performance compared to early/late stacking fusion models and other machine learning models (RF, SVR, PLS, DNN-F2 and Multi-channel CNN). Overall, the multimodal deep fusion model developed in this study demonstrates outstanding performance in assessing weed competitiveness and can predict the competitive intensity of weeds in maize across various growth stages on a broad scale.

Keywords