Communications Engineering (Sep 2024)

Localization and recognition of human action in 3D using transformers

  • Jiankai Sun,
  • Linjiang Huang,
  • Hongsong Wang,
  • Chuanyang Zheng,
  • Jianing Qiu,
  • Md Tauhidul Islam,
  • Enze Xie,
  • Bolei Zhou,
  • Lei Xing,
  • Arjun Chandrasekaran,
  • Michael J. Black

DOI
https://doi.org/10.1038/s44172-024-00272-7
Journal volume & issue
Vol. 3, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Understanding a person’s behavior from their 3D motion sequence is a fundamental problem in computer vision with many applications. An important component of this problem is 3D action localization, which involves recognizing what actions a person is performing, and when the actions occur in the sequence. To promote the progress of the 3D action localization community, we introduce a new, challenging, and more complex benchmark dataset, BABEL-TAL (BT), for 3D action localization. Important baselines and evaluating metrics, as well as human evaluations, are carefully established on this benchmark. We also propose a strong baseline model, i.e., Localizing Actions with Transformers (LocATe), that jointly localizes and recognizes actions in a 3D sequence. The proposed LocATe shows superior performance on BABEL-TAL as well as on the large-scale PKU-MMD dataset, achieving state-of-the-art performance by using only 10% of the labeled training data. Our research could advance the development of more accurate and efficient systems for human behavior analysis, with potential applications in areas such as human-computer interaction and healthcare.