IEEE Access (Jan 2024)

METF: Modeling Macro-Micro Human Intention by Multi-Encoder Transformer for Human Trajectory Prediction

  • Xincheng Hu,
  • Bo Yang,
  • Jixing Yang,
  • Teng Zhang

DOI
https://doi.org/10.1109/ACCESS.2024.3426958
Journal volume & issue
Vol. 12
pp. 96193 – 96204

Abstract

Read online

Human trajectory prediction tasks find applications in many fields, like autonomous driving and social robots. The main challenge arises from the fact that pedestrians, while walking, consider their own route and constantly account for their spatial and temporal interactions with other pedestrians to avoid collisions. However, most existing state-of-the-art models either overlook the balance between a pedestrian’s own path and their interactions with others, or they focus solely on either one of these aspects. We posit that an effective pedestrian trajectory prediction should incorporate both macro and micro perspectives. In this paper, We propose a Multi-Encoder-TransFormer-network (METF), which can balancing the information between micro and macro. First, we propose a multi-encoder architecture to simultaneously encode macroscopic and microscopic information and allocate different degrees of importance for macroscopic and microscopic information. Then, we introduce a graph attention mechanism to capture the interactions between pedestrians at each moment, and also introduce an attention module to learn the time dependence of the interaction in different moments within a long time range. We also redesigned the input, output and computational methods of the transformer decoder for the trajectory prediction problem, and reduced the computational cost while maintaining the accuracy. Upon comparing with a wide range of methods, we found that METF achieved superior performance on two publicly available datasets (ETH and UCY), producing trajectories that align more closely with pedestrian social walking patterns. Ablation experiments illustrate the effectiveness of the designs for various parts in the METF.

Keywords