Scientific Data (Feb 2024)

A benchmark GaoFen-7 dataset for building extraction from satellite images

  • Peimin Chen,
  • Huabing Huang,
  • Feng Ye,
  • Jinying Liu,
  • Weijia Li,
  • Jie Wang,
  • Zixuan Wang,
  • Chong Liu,
  • Ning Zhang

DOI
https://doi.org/10.1038/s41597-024-03009-5
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Accurate building extraction is crucial for urban understanding, but it often requires a substantial number of building samples. While some building datasets are available for model training, there remains a lack of high-quality building datasets covering urban and rural areas in China. To fill this gap, this study creates a high-resolution GaoFen-7 (GF-7) Building dataset utilizing the Chinese GF-7 imagery from six Chinese cities. The dataset comprises 5,175 pairs of 512 × 512 image tiles, covering 573.17 km2. It contains 170,015 buildings, with 84.8% of the buildings in urban areas and 15.2% in rural areas. The usability of the GF-7 Building dataset has been proved with seven convolutional neural networks, all achieving an overall accuracy (OA) exceeding 93%. Experiments have shown that the GF-7 building dataset can be used for building extraction in urban and rural scenarios. The proposed dataset boasts high quality and high diversity. It supplements existing building datasets and will contribute to promoting new algorithms for building extraction, as well as facilitating intelligent building interpretation in China.