Spatio‐temporal multi‐scale motion descriptor from a spatially‐constrained decomposition for online action recognition

Fabio Martínez; Antoine Manzanera; Eduardo Romero

doi:10.1049/iet-cvi.2016.0055

IET Computer Vision (Oct 2017)

Spatio‐temporal multi‐scale motion descriptor from a spatially‐constrained decomposition for online action recognition

Fabio Martínez,
Antoine Manzanera,
Eduardo Romero

Affiliations

Fabio Martínez: CIM@LABUniversidad Nacional de ColombiaBogotáColombia
Antoine Manzanera: U2IS/Robotics‐VisionENSTA‐ParisTech, Université de Paris‐SaclayPalaiseauFrance
Eduardo Romero: CIM@LABUniversidad Nacional de ColombiaBogotáColombia

DOI: https://doi.org/10.1049/iet-cvi.2016.0055
Journal volume & issue: Vol. 11, no. 7
pp. 541 – 549

Abstract

Read online

This study presents a spatio‐temporal motion descriptor that is computed from a spatially‐constrained decomposition and applied to online classification and recognition of human activities. The method starts by computing a dense optical flow without explicit spatial regularisation. Potential human actions are detected at each frame as spatially consistent moving regions of interest (RoIs). Each of these RoIs is then sequentially partitioned to obtain a spatial representation of small overlapped subregions with different sizes. Each of these region parts is characterised by a set of flow orientation histograms. A particular RoI is then described along the time by a set of recursively calculated statistics that collect information from the temporal history of orientation histograms, to form the action descriptor. At any time, the whole descriptor can be extracted and labelled by a previously trained support vector machine. The method was evaluated using three different public datasets: (i) the ViSOR dataset was used for global classification obtaining an average accuracy of 95% and for recognition in long sequences, achieving an average per‐frame accuracy of 92.3%. (ii) The KTH dataset was used for global classification and (iii) the UT‐datasets were used for recognition task, obtaining an average accuracy of 80% (frame rate).

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords