IEEE Access (Jan 2022)

A Novel Few-Shot Action Recognition Method: Temporal Relational CrossTransformers Based on Image Difference Pyramid

  • Yihang Ding,
  • Youyuan Liu

DOI
https://doi.org/10.1109/ACCESS.2022.3204404
Journal volume & issue
Vol. 10
pp. 94536 – 94544

Abstract

Read online

Most current few-shot action recognition methods model temporal relationships on the basis of image classification and achieve satisfactory results. However, they focus on the extra temporal information of video data compared to images and use the frame tuple embedding representation of the query video for matching, but ignore the important information of “action changing feature” in action recognition. To use this information, we propose the Temporal Relational CrossTransformers Based on Image Difference Pyramid (TRX-IDP) method for few-shot action recognition. Based on TRX, we perform high-order image difference, sigmoid enhancement, resizing on the frame tuples which are directly used for query, and use the frame tuples to calculate the Motion History Image (MHI). Combined with the two, we construct the Image Difference Pyramid containing motion feature information. We also develop CrossTransformers query representation for IDP and restructure the linear mapping function of the model. We evaluate our model using four commonly used few-shot action recognition benchmark datasets. TRX-IDP achieves state-of-the-art performance on partial SSv2, HMDB51, and UCF101, while slightly lagging behind the current best models on Kinetics and SSv2. In addition, we perform detailed ablation experiments on TRX-IDP to prove the importance of each part of the model and to give the best hyperparameters of TRX-IDP.

Keywords