Anticipating Next Active Objects for Egocentric Videos

Sanket Kumar Thakur; Cigdem Beyan; Pietro Morerio; Vittorio Murino; Alessio del Bue

doi:10.1109/ACCESS.2024.3395282

IEEE Access (Jan 2024)

Anticipating Next Active Objects for Egocentric Videos

Sanket Kumar Thakur,
Cigdem Beyan,
Pietro Morerio,
Vittorio Murino,
Alessio del Bue

Affiliations

Sanket Kumar Thakur: Pattern Analysis and Computer Vision (PAVIS) Research Line, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
Cigdem Beyan: ORCiD; Department of Computer Science, University of Verona, Verona, Italy
Pietro Morerio: ORCiD; Pattern Analysis and Computer Vision (PAVIS) Research Line, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
Vittorio Murino: ORCiD; Pattern Analysis and Computer Vision (PAVIS) Research Line, Istituto Italiano di Tecnologia (IIT), Genoa, Italy
Alessio del Bue: ORCiD; Pattern Analysis and Computer Vision (PAVIS) Research Line, Istituto Italiano di Tecnologia (IIT), Genoa, Italy

DOI: https://doi.org/10.1109/ACCESS.2024.3395282
Journal volume & issue: Vol. 12
pp. 61767 – 61779

Abstract

Read online

Active objects are those in contact with the first person in an egocentric video. This paper addresses the challenge of anticipating the future location of the next active object in relation to a person within a given egocentric video clip, which is challenging since the contact is poised to happen after the last observed frame by the model, even before any action takes place. As we aim to estimate the position of objects, this problem is particularly hard in a scenario where the observed clip and the action segment are separated by the so-called time-to-contact segment. We term this task Anticipating the Next ACTive Object (ANACTO) and introduce a transformer-based self-attention framework to tackle it. We compare our model with the existing anticipation-based methods to establish relevant baseline methods, where our approach outperforms all of them on three major egocentric datasets: EpicKitchens-100, EGTEA+, and Ego4D. We also conduct an ablation study to better present the effectiveness of the proposed and baseline methods on varying conditions. The code as well as the ANACTO task annotations for the aforementioned first two datasets will be made available upon the acceptance of this paper.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords