Complex & Intelligent Systems (Jun 2024)

MDSTF: a multi-dimensional spatio-temporal feature fusion trajectory prediction model for autonomous driving

  • Xing Wang,
  • Zixuan Wu,
  • Biao Jin,
  • Mingwei Lin,
  • Fumin Zou,
  • Lyuchao Liao

DOI
https://doi.org/10.1007/s40747-024-01490-4
Journal volume & issue
Vol. 10, no. 5
pp. 6647 – 6665

Abstract

Read online

Abstract In the field of autonomous driving, trajectory prediction of traffic agents is an important and challenging problem. Fully capturing the complex spatio-temporal features in trajectory data is crucial for accurate trajectory prediction. This paper proposes a trajectory prediction model called multi-dimensional spatio-temporal feature fusion (MDSTF), which integrates multi-dimensional spatio-temporal features to model the trajectory information of traffic agents. In the spatial dimension, we employ graph convolutional networks (GCN) to capture the local spatial features of traffic agents, spatial attention mechanism to capture the global spatial features, and LSTM combined with spatial attention to capture the full-process spatial features of traffic agents. Subsequently, these three spatial features are fused using a gate fusion mechanism. Moreover, during the modeling of the full-process spatial features, LSTM is capable of capturing short-term temporal dependencies in the trajectory information of traffic agents. In the temporal dimension, we utilize a Transformer-based encoder to extract long-term temporal dependencies in the trajectory information of traffic agents, which are then fused with the short-term temporal dependencies captured by LSTM. Finally, we employ two temporal convolutional networks (TCN) to predict trajectories based on the fused spatio-temporal features. Experimental results on the ApolloScape trajectory dataset demonstrate that our proposed method outperforms state-of-the-art methods in terms of weighted sum of average displacement error (WSADE) and weighted sum of final displacement error (WSFDE) metrics. Compared to the best baseline model (S2TNet), our method achieves reductions of 4.37% and 6.23% respectively in these metrics.

Keywords