ISPRS International Journal of Geo-Information (Jan 2022)

A System for Aligning Geographical Entities from Large Heterogeneous Sources

  • André Melo,
  • Btissam Er-Rahmadi,
  • Jeff Z. Pan

DOI
https://doi.org/10.3390/ijgi11020096
Journal volume & issue
Vol. 11, no. 2
p. 96

Abstract

Read online

Aligning points of interest (POIs) from heterogeneous geographical data sources is an important task that helps extend map data with information from different datasets. This task poses several challenges, including differences in type hierarchies, labels (different formats, languages, and levels of detail), and deviations in the coordinates. Scalability is another major issue, as global-scale datasets may have tens or hundreds of millions of entities. In this paper, we propose the GeographicaL Entities AligNment (GLEAN) system for efficiently matching large geographical datasets based on spatial partitioning with an adaptable margin. In particular, we introduce a text similarity measure based on the local-context relevance of tokens used in combination with sentence embeddings. We then come up with a scalable type embedding model. Finally, we demonstrate that our proposed system can efficiently handle the alignment of large datasets while improving the quality of alignments using the proposed entity similarity measure.

Keywords