Building Precision: Efficient Encoder&#x2013;Decoder Networks for Remote Sensing Based on Aerial RGB and LiDAR Data

Muhammad Sulaiman; Erik Finnesand; Mina Farmanbar; Ahmed Nabil Belbachir; Chunming Rong

doi:10.1109/ACCESS.2024.3391416

IEEE Access (Jan 2024)

Building Precision: Efficient Encoder–Decoder Networks for Remote Sensing Based on Aerial RGB and LiDAR Data

Muhammad Sulaiman,
Erik Finnesand,
Mina Farmanbar,
Ahmed Nabil Belbachir,
Chunming Rong

Affiliations

Muhammad Sulaiman: ORCiD; Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway
Erik Finnesand: Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway
Mina Farmanbar: Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway
Ahmed Nabil Belbachir: ORCiD; NORCE Norwegian Research Centre, Bergen, Norway
Chunming Rong: ORCiD; Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway

DOI: https://doi.org/10.1109/ACCESS.2024.3391416
Journal volume & issue: Vol. 12
pp. 60329 – 60346

Abstract

Read online

Precision in building delineation plays a pivotal role in population data analysis, city management, policy making, and disaster management. Leveraging computer vision technologies, particularly deep learning models for semantic segmentation, has proven instrumental in achieving accurate automatic building segmentation in remote sensing applications. However, current state-of-the-art (SOTA) techniques are not optimized for precisely extracting building footprints and, specifically, boundaries of the building. This deficiency highlights the need to leverage Light Detection and Ranging (LiDAR) data in conjunction with aerial RGB and streamlined deep learning for improved precision. This work utilizes the MapAI dataset, which includes a variety of objects beyond buildings, such as trees, electricity lines, solar panels, vehicles, and roads. These objects showcase diverse colors and structures, mirroring the rooftops in Denmark and Norway. Due to the aforementioned problems, this study modified UNet and CT-UNet to use LiDAR data and RGB images to segment buildings using Intersection Over Union (IoU) to evaluate building overlap and Boundary Intersection Over Union (BIoU) to evaluate precise building boundaries and shapes. The proposed work changes the configuration of these networks to streamline with LiDAR data for efficient segmentation. The batch data in training is augmented to improve model generalization and overcome overfitting. Batch normalization inclusion also improves overfitting. Four backbones with transfer learning are employed to enhance convergence and parameter efficiency of segmentation: ResNet50V2, DenseNet201, EfficientNetB4, and EfficientNetV2S. Test-Time Augmentation (TTA) is employed to improve the predicted mask. Experiments are performed using single and ensemble models, with and without Augmentation. The ensemble model outperforms the single model, and TTA also improves the results. LiDAR data with RGB improves the combined score (average of IoU and BIoU) by 13.33% compared to only RGB images.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords