Journal of Spatial Information Science (Jun 2024)
Textual geolocation in Hebrew: mapping challenges via natural place description analysis
Abstract
Describing where a place is situated is an innate communication skill that relies on spatial cognition, spatial reasoning, and linguistic systems. Accordingly, textual geolocation, a task for retrieving the coordinates of a place from linguistic descriptions, requires computerized spatial inference and natural language understanding. Yet, machine-based textual geolocation is currently limited, mainly due to the lack of rich geo-textual datasets necessitated to train natural language models that, in-turn, cannot adequately interpret the language-based expressions. These limitations are intensified in morphologically rich and resource-poor languages, such as Hebrew. This study aims to analyze and understand the linguistic systems used for place descriptions in Hebrew, later to be used to train machine learning natural language models. A novel crowdsourced geo-textual dataset is developed, composed of 5,695 written place descriptions provided by 1,554 native Hebrew speakers. All place descriptions rely on memory only, which increases spatial vagueness and requires referring expression resolution. Qualitative linguistic analysis of place descriptions shows that geospatial reasoning is greatly used in Hebrew, while empirical analysis with textual geolocation engines indicates that literal descriptions pose challenges for existing methods, as they require real understanding of space and geospatial references and cannot simply be geolocated by matching gazetteer with textual geo-entity extractions. The findings offer improved understanding of the challenges entailed in natural language processing of Hebrew geolocation, contributing to formalizing computerized systems used in future machine learning models for complex geographic information retrieval tasks.
Keywords