Remote Sensing (Jun 2025)

MTCDNet: Multimodal Feature Fusion-Based Tree Crown Detection Network Using UAV-Acquired Optical Imagery and LiDAR Data

  • Heng Zhang,
  • Can Yang,
  • Xijian Fan

DOI
https://doi.org/10.3390/rs17121996
Journal volume & issue
Vol. 17, no. 12
p. 1996

Abstract

Read online

Accurate detection of individual tree crowns is a critical prerequisite for precisely extracting forest structural parameters, which is vital for forestry resources monitoring. While unmanned aerial vehicle (UAV)-acquired RGB imagery, combined with deep learning-based networks, has demonstrated considerable potential, existing methods often rely exclusively on RGB data, rendering them susceptible to shadows caused by varying illumination and suboptimal performance in dense forest stands. In this paper, we propose integrating LiDAR-derived Canopy Height Model (CHM) with RGB imagery as complementary cues, shifting the paradigm of tree crown detection from unimodal to multimodal. To fully leverage the complementary properties of RGB and CHM, we present a novel Multimodal learning-based Tree Crown Detection Network (MTCDNet). Specifically, a transformer-based multimodal feature fusion strategy is proposed to adaptively learn correlations among multilevel features from diverse modalities, which enhances the model’s ability to represent tree crown structures by leveraging complementary information. In addition, a learnable positional encoding scheme is introduced to facilitate the fused features in capturing the complex, densely distributed tree crown structures by explicitly incorporating spatial information. A hybrid loss function is further designed to enhance the model’s capability in handling occluded crowns and crowns of varying sizes. Experiments conducted on two challenging datasets with diverse stand structures demonstrate that MTCDNet significantly outperforms existing state-of-the-art single-modality methods, achieving AP50 scores of 93.12% and 94.58%, respectively. Ablation studies further confirm the superior performance of the proposed fusion network compared to simple fusion strategies. This research indicates that effectively integrating RGB and CHM data offers a robust solution for enhancing individual tree crown detection.

Keywords