IEEE Access (Jan 2024)

Sparse Transformer Network With Spatial-Temporal Graph for Pedestrian Trajectory Prediction

  • Long Gao,
  • Xiang Gu,
  • Feng Chen,
  • Jin Wang

DOI
https://doi.org/10.1109/ACCESS.2024.3442442
Journal volume & issue
Vol. 12
pp. 144725 – 144737

Abstract

Read online

Pedestrian trajectory prediction is a key technology in surveillance systems and autonomous driving. However, due to the high uncertainty and dynamic spatial-temporal dependence of pedestrian movement, timely and accurate pedestrian trajectory prediction, especially long-term prediction, is still an open challenge. However, the existing models lack an effective modeling method for temporal dependence and spatial interaction modeling. To solve these problems, in this paper, we propose a novel paradigm of Sparse Transformer Networks with Spatial-Temporal Graph (STGSTN), which captures complex spatial-temporal interactions by stacking spatial-temporal Transformer blocks and improves the accuracy of pedestrian trajectory prediction by combining dynamic spatial dependence and long-range temporal dependence. We propose a new variant of sparse spatial transformer combined with graph neural networks, which uses the self-attention mechanism to dynamically model spatial dependencies to capture the state of pedestrian movement. The multi-head attention mechanism is also used to jointly model various patterns of spatial dependence. In addition, the sparse temporal transformer is used to learn the sparse attention map for time modeling, and then the long-range temporal dependence between pedestrians is modeled. Compared with the existing work, STGSTN can effectively and efficiently train long-range spatial-temporal dependencies. Experimental results show that STGSTN is competitive with state-of-the-art techniques on the ETH-UCY dataset, especially for long-term prediction. The performance of STGSTN may be limited by the quality of the training data, and its generalization to highly dynamic or crowded environments remains a challenge. Future work will focus on addressing these limitations and exploring additional contextual information to further enhance prediction accuracy.

Keywords