Journal of Remote Sensing (Jan 2023)

Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global Land Cover Mapping

  • Qian Shi,
  • Da He,
  • Zhengyu Liu,
  • Xiaoping Liu,
  • Jingqian Xue

DOI
https://doi.org/10.34133/remotesensing.0078
Journal volume & issue
Vol. 3

Abstract

Read online

Global land cover map provides fundamental information for understanding the relationship between global environmental change and human settlement. With the development of data-driven deep learning theory, semantic segmentation network has largely facilitated the global land cover mapping activity. However, the performance of semantic segmentation network is closely related to the number and quality of training data, and the existing annotation data are usually insufficient in quantity, quality, and spatial resolution, and are usually sampled at local region and lack diversity and variability, making data-driven model difficult to extend to global scale. Therefore, we proposed a large-scale annotation dataset (Globe230k) for semantic segmentation of remote sensing image, which has 3 superiorities: (a) large scale: the Globe230k dataset includes 232,819 annotated images with a size of 512 × 512 and a spatial resolution of 1 m, including 10 first-level categories; (b) rich diversity: the annotated images are sampled from worldwide regions, with coverage area of over 60,000 km2, indicating a high variability and diversity; (c) multimodal: the Globe230k dataset not only contains RGB bands but also includes other important features for Earth system research, such as normalized differential vegetation index (NDVI), digital elevation model (DEM), vertical–vertical polarization (VV) bands, and vertical–horizontal polarization (VH) bands, which can facilitate the multimodal data fusion research. We used the Globe230k dataset to test several state-of-the-art semantic segmentation algorithms and found that it is able to evaluate algorithms in multiple aspects that are crucial for characterizing land covers, including multiscale modeling, detail reconstruction, and generalization ability. The dataset has been made public and can be used as a benchmark to promote further development of global land cover mapping and semantic segmentation algorithm development.