International Journal of Digital Earth (Dec 2023)
Strategies in training deep learning models to extract building from multisource images with small training sample sizes
Abstract
Building extraction from remote sensing data is an important topic in urban studies and the deep learning methods have an increasing role due to their minimal requirements in training data to reach outstanding performance. We aimed to investigate the original U-Net architecture’s efficiency in building segmentation with different number of training images and the role of data augmentation based on multisource remote sensing data with varying spatial and spectral resolutions (WorldView-2 [WV2], WorldView-3 [WV3] images and an aerial orthophoto [ORTHO]). When the trainings and predictions were conducted on the same image, U-Net provided good results with very few training images (validation accuracies: 94-97%; 192 images). Combining the ORTHO’s and WV2’s training data for prediction on WV3 provided poor results with low F1-score (0.184). However, the inclusion of only 48 WV3 training images significantly improved the F1-score (0.693), thus, most buildings were correctly identified. Accordingly, using only independent reference data (other than the target image) is not enough to train an accurate model. In our case, the reference from WW2 and ORTHO images did not provide an acceptable basis to train a good model, but a minimal number of training images from the targeted WV3 improved the accuracy (F1-score: 69%).
Keywords