GIScience & Remote Sensing (Dec 2024)
Urban region representation learning with human trajectories: a multi-view approach incorporating transition, spatial, and temporal perspectives
Abstract
Mining latent information from human trajectories for understanding our cities has been persistent endeavors in urban studies and spatial information science. Many previous studies relied on manually crafted features and followed a supervised learning pipeline for a particular task, e.g. land use classification. However, such methods often overlook some types of latent information and the commonalities between varying urban sensing tasks, making the features engineered for one specific task sometimes not useful in other tasks. To tackle the limitations, we propose a multi-view trajectory embedding (MTE) approach to learn the features of urban regions (region representations) in an unsupervised manner, which does not rely on a specific task and thus can be generalized to varying urban sensing tasks. Specifically, MTE incorporates three salient information views carried by human trajectories, i.e. transition, spatial, and temporal views. We utilize skip-gram to model human transition patterns exhibited from massive amounts of human trajectories, where long-range dependency is meaningful. Subsequently, we leverage unsupervised graph representation learning to model spatial adjacency and temporal pattern similarities, where short-range dependency is favorable. We perform extensive experiments on three downstream tasks, i.e. land use classification, population density estimation, and house price prediction. The results indicate that MTE considerably outperforms a series of competitive baselines in all three tasks, and different information views have varying levels of effectiveness in particular downstream tasks, e.g. the temporal view is more effective than the spatial view in land use classification, while it is the opposite in house price prediction.
Keywords