IET Image Processing (Sep 2024)

MDTrans: Multi‐scale and dual‐branch feature fusion network based on Swin Transformer for building extraction in remote sensing images

  • Kuo Diao,
  • Jinlong Zhu,
  • Guangjie Liu,
  • Meng Li

DOI
https://doi.org/10.1049/ipr2.13145
Journal volume & issue
Vol. 18, no. 11
pp. 2930 – 2942

Abstract

Read online

Abstract Effective extraction of building from remote sensing images requires both global and local information. Despite convolutional neural networks (CNNs) excelling at capturing local details, their intrinsic focus on local operations poses challenge in effectively extracting global features, especially in the context of large‐scale buildings. In contrast, transformers excel at capturing global information, but compared to CNNs, they tend to overly rely on large‐scale datasets and pre‐trained parameters. To tackle the challenge, this paper presents the multi‐scale and dual‐branch feature fusion network (MDTrans). Specifically, the CNN and transformer branches are integrated in a dual‐branch parallel manner during both encoding and decoding stages, local information for small‐scale buildings is extracted by utilizing Dense Connection Blocks in the CNN branch, while crucial global information for large‐scale buildings is effectively captured through Swin Transformer Block in the transformer branch. Additionally, Dual Branch Information Fusion Block is designed to fuse local and global features from the two branches. Furthermore, Multi‐Convolutional Block is designed to further enhance the feature extraction capability of buildings with different sizes. Through extensive experiments on the WHU, Massachusetts, and Inria building datasets, MDTrans achieves intersection over union (IoU) scores of 91.36%, 64.69%, and 79.25%, respectively, outperforming other state‐of‐the‐art models.

Keywords