International Journal of Applied Earth Observations and Geoinformation (Apr 2024)

SpatialScene2Vec: A self-supervised contrastive representation learning method for spatial scene similarity evaluation

  • Danhuai Guo,
  • Yingxue Yu,
  • Shiyin Ge,
  • Song Gao,
  • Gengchen Mai,
  • Huixuan Chen

Journal volume & issue
Vol. 128
p. 103743

Abstract

Read online

Spatial scene similarity plays a crucial role in spatial cognition, as it enables us to understand and compare different spatial scenes and their relationships. However, understanding spatial scenes is a complex task. While existing literature has contributed to spatial scene representation learning, these methods primarily focus on comprehending the spatial relationships among objects, often neglecting their semantic features. Furthermore, there is a lack of scene representation learning methods that can seamlessly handle different types of spatial objects (e.g., points, polylines, and polygons) in a scene. Moreover, since expert knowledge is required for the annotation process of spatial scene understanding, publicly available high-quality annotation data has a limited size which usually leads to suboptimal results. To address these issues, we propose a novel multi-scale spatial scene encoding model called SpatialScene2Vec. SpatialScene2Vec utilizes a point location encoder to seamlessly encode the spatial information of different types of spatial objects. A point feature encoder is employed to encode the semantic features of these objects. A spatial scene embedding is generated by integrating the spatial embeddings and feature embeddings of spatial objects within this scene. Furthermore, to address the limited labeled data problem, we propose a self-supervised learning framework to train the SpatialScene2Vec model in which a contrastive loss is used for spatial scene similarity evaluation. In addition, we introduce a novel spatial scene data augmentation method to generate positive scene augmentations by leveraging the unique characteristics of spatial scenes and random sampling points based on the shapes of polyline/polygon objects within the current spatial scenes. We conduct experiments on real-world datasets for spatial scene retrieval tasks, including vector data types of points, polylines, and polygons. Results show that SpatialScene2Vec outperforms well-established encoding methods such as Space2Vec due to the advantages of the integrated multi-scale representations and the proposed spatial scene data augmentation method, with significant improvements and robustness.

Keywords