IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2021)
MHA-Net: Multipath Hybrid Attention Network for Building Footprint Extraction From High-Resolution Remote Sensing Imagery
Abstract
Deep learning approaches have been widely applied to building footprint extraction using high-resolution imagery. However, the traditional fully convolution network still has problems in recovering spatial details and discriminating buildings with varying sizes and styles. We propose a novel multipath hybrid attention network (MHA-Net) to address these challenges. We design a separable convolution block attention module and an attention downsampling module as the basic modules with separable convolutions and channel attention. The MHA-Net architecture consists of three components: the encoding network, multipath hybrid dilated convolution (HDC), and dense upsampling convolution (DUC). The encoding network is used to encode the high-level semantic contexts of images. The multipath HDC aggregates multiscale features by combining rich semantic representations extracted by HDCs, which can achieve promising results in extracting tiny buildings. The DUC is capable of recovering precise spatial information of buildings. We evaluate our network on two public datasets: the WHU aerial building dataset and the Massachusetts building dataset. According to the experimental results, MHA-Net outperforms other classical semantic segmentation models and several recent building extraction models. In particular, MHA-Net can improve the extraction accuracy of small buildings and is robust to complicated building roofs.
Keywords