Ecological Informatics (Nov 2024)

A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas

  • Jing Liu,
  • Jin Hou,
  • Dan Liu,
  • Qijun Zhao,
  • Rui Chen,
  • Xiaoyuan Chen,
  • Vanessa Hull,
  • Jindong Zhang,
  • Jifeng Ning

Journal volume & issue
Vol. 83
p. 102797

Abstract

Read online

Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.

Keywords