IEEE Access (Jan 2022)
Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario
Abstract
Foreground object segmentation that captures the spatial and temporal information of moving objects in video is the most fundamental task for activity understanding in many intelligent applications, such as smart stores. Recently, several methods are proposed for the detection and recognition of activity based on object segmentation. However, these methods are often inaccurate because they do not maintain the temporal associations of object segment consistency across time. In this work, we proposed a hierarchical approach for foreground object segmentation and activity semantics understanding from sequential video to preserve spatial and temporal connectivity in the frames. The proposed system consists of two main modules: (a) the concatenated deep learning network containing PSPNet and convolutional-GRU to segment the foreground of an object of interest; (b) the activity mining framework which incorporates three sub-modules (i) a RetinaNet-based frame classifier to detect and count objects of interest; (ii) a time-domain activity and event detection algorithm; (iii) an image-based item query engine to recognize the shopping items. To evaluate the proposed approach, we designed the smart checkout-box called iCounter to collect the shopping activities dataset named “NOL-41” which is used in extensive experiments. The results show that the accuracy of the foreground object segmentation is 90.6%, the accuracy of the frame classification is 93.4%, the accuracy of activity event detection is 98.4%, and the accuracy of item query is 94.3%. Finally, the overall accuracy of the shopping list is 95.2%.
Keywords