Remote Sensing (Apr 2023)
On the Robustness and Generalization Ability of Building Footprint Extraction on the Example of SegNet and Mask R-CNN
Abstract
Building footprint (BFP) extraction focuses on the precise pixel-wise segmentation of buildings from aerial photographs such as satellite images. BFP extraction is an essential task in remote sensing and represents the foundation for many higher-level analysis tasks, such as disaster management, monitoring of city development, etc. Building footprint extraction is challenging because buildings can have different sizes, shapes, and appearances both in the same region and in different regions of the world. In addition, effects, such as occlusions, shadows, and bad lighting, have to also be considered and compensated. A rich body of work for BFP extraction has been presented in the literature, and promising research results have been reported on benchmarking datasets. Despite the comprehensive work performed, it is still unclear how robust and generalizable state-of-the-art methods are to different regions, cities, settlement structures, and densities. The purpose of this study is to close this gap by investigating questions on the practical applicability of BFP extraction. In particular, we evaluate the robustness and generalizability of state-of-the-art methods as well as their transfer learning capabilities. Therefore, we investigate in detail two of the most popular deep learning architectures for BFP extraction (i.e., SegNet, an encoder–decoder-based architecture and Mask R-CNN, an object detection architecture) and evaluate them with respect to different aspects on a proprietary high-resolution satellite image dataset as well as on publicly available datasets. Results show that both networks generalize well to new data, new cities, and across cities from different continents. They both benefit from increased training data, especially when this data is from the same distribution (data source) or of comparable resolution. Transfer learning from a data source with different recording parameters is not always beneficial.
Keywords