Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario

Tzu-Wei Yu; Muhammad Atif Sarwar; Yousef-Awwad Daraghmi; Sheng-Hsien Cheng; Tsi-Ui Ik; Yih-Lang Li

doi:10.1109/ACCESS.2022.3178609

IEEE Access (Jan 2022)

Spatiotemporal Activity Semantics Understanding Based on Foreground Object Segmentation: iCounter Scenario

Tzu-Wei Yu,
Muhammad Atif Sarwar,
Yousef-Awwad Daraghmi,
Sheng-Hsien Cheng,
Tsi-Ui Ik,
Yih-Lang Li

Affiliations

Tzu-Wei Yu: Department of Computer Science, College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Muhammad Atif Sarwar: ORCiD; Department of Computer Science, College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Yousef-Awwad Daraghmi: ORCiD; Department of Computer Systems Engineering, Palestine Technical University—Kadoorie, Tulkarem, Palestine
Sheng-Hsien Cheng: Department of Computer Science, College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Tsi-Ui Ik: ORCiD; Department of Computer Science, College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Yih-Lang Li: ORCiD; Department of Computer Science, College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2022.3178609
Journal volume & issue: Vol. 10
pp. 57748 – 57758

Abstract

Read online

Foreground object segmentation that captures the spatial and temporal information of moving objects in video is the most fundamental task for activity understanding in many intelligent applications, such as smart stores. Recently, several methods are proposed for the detection and recognition of activity based on object segmentation. However, these methods are often inaccurate because they do not maintain the temporal associations of object segment consistency across time. In this work, we proposed a hierarchical approach for foreground object segmentation and activity semantics understanding from sequential video to preserve spatial and temporal connectivity in the frames. The proposed system consists of two main modules: (a) the concatenated deep learning network containing PSPNet and convolutional-GRU to segment the foreground of an object of interest; (b) the activity mining framework which incorporates three sub-modules (i) a RetinaNet-based frame classifier to detect and count objects of interest; (ii) a time-domain activity and event detection algorithm; (iii) an image-based item query engine to recognize the shopping items. To evaluate the proposed approach, we designed the smart checkout-box called iCounter to collect the shopping activities dataset named “NOL-41” which is used in extensive experiments. The results show that the accuracy of the foreground object segmentation is 90.6%, the accuracy of the frame classification is 93.4%, the accuracy of activity event detection is 98.4%, and the accuracy of item query is 94.3%. Finally, the overall accuracy of the shopping list is 95.2%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords