Applied Sciences (Jun 2024)

Global–Local Deep Fusion: Semantic Integration with Enhanced Transformer in Dual-Branch Networks for Ultra-High Resolution Image Segmentation

  • Chenjing Liang,
  • Kai Huang,
  • Jian Mao

DOI
https://doi.org/10.3390/app14135443
Journal volume & issue
Vol. 14, no. 13
p. 5443

Abstract

Read online

The fusion of global contextual information with local cropped block details is crucial for segmenting ultra-high resolution images. In this study, A novel fusion mechanism termed global–local deep fusion (GL-Deep Fusion) is introduced, based on an enhanced transformer architecture that efficiently integrates global contextual information and local details. Specifically, we propose the global–local synthesis networks (GLSNet), a dual-branch network where one branch processes the entire original image, while the other branch handles cropped local patches as input. The feature fusion of different branches in GLSNet is achieved through GL-Deep Fusion, significantly enhancing the accuracy of ultra-high resolution image segmentation. Identifying tiny overlapping items is a task where the model excels, demonstrating its particular effectiveness. To optimize GPU memory utilization, a dual-branch architecture was meticulously designed. This architecture proficiently leverages the features it extracts and seamlessly integrates them into the enhanced transformer framework of GL-Deep Fusion. Benchmarks on the DeepGlobe and Vaihingen datasets demonstrate the efficiency and accuracy of the proposed model. It significantly reduces GPU memory usage by 24.1% on the DeepGlobe dataset, enhancing segmentation accuracy by 0.8% over the baseline model. On the Vaihingen dataset, our model delivers a Mean F1 score of 90.2% and achieves a mIoU of 90.9%, highlighting its exceptional memory efficiency and segmentation precision.

Keywords