EURASIP Journal on Image and Video Processing (Nov 2017)

Time-dependent bag of words on manifolds for geodesic-based classification of video activities towards assisted living and healthcare

  • Yixiao Yun,
  • Irene Yu-Hua Gu

DOI
https://doi.org/10.1186/s13640-017-0220-3
Journal volume & issue
Vol. 2017, no. 1
pp. 1 – 13

Abstract

Read online

Abstract In this paper, we address the problem of classifying activities of daily living (ADL) in video. The basic idea of the proposed method is to treat each human activity in the video as a temporal sequence of points on a Riemannian manifold and classify such time series with a geodesic-based kernel. The main novelties of this paper are summarized as follows: (a) for each frame of a video, low-level features of body pose and human-object interaction are unified by a covariance matrix, i.e., a manifold point in the space of symmetric positive definite (SPD) matrices Sy m + d $Sym_{+}^{d}$ ; (b) a time-dependent bag-of-words (BoW+T) model is built, where its codebook is generated by clustering per-frame covariance matrices on Sy m + d $Sym_{+}^{d}$ ; (c) for each video, high-level BoW+T features are extracted from its corresponding sequence of per-frame covariance matrices; and (d) for activity classification, a positive definite kernel is formulated, taking into account the underlying geometry of our BoW+T features, i.e., the unit n-sphere. Experiments were conducted on two video datasets. The first dataset contains 8 activity classes with a total of 943 videos, and the second one contains 7 activity classes with a total of 224 videos. The proposed method achieved high accuracy (average 89.66%) and small false alarms (average 1.43%) on the first dataset. Comparison with six exisiting methods on the second dataset showed further evidence on the effectiveness of the proposed method.

Keywords