IEEE Access (Jan 2019)
A Two-Stage Attribute-Constraint Network for Video-Based Person Re-Identification
Abstract
Person re-identification has gradually become a popular research topic in many fields such as security, criminal investigation, and video analysis. This paper aims to learn a discriminative and robust spatial–temporal representation for video-based person re-identification by a two-stage attribute-constraint network (TSAC-Net). The knowledge of pedestrian attributes can help re-identification tasks because it contains high-level information and is robust to visual variations. In this paper, we manually annotate three video-based person re-identification datasets with four static appearance attributes and one dynamic appearance attribute. Each attribute is regarded as a constraint that is added to the deep network. In the first stage of the TSAC-Net, we solve the re-identification problem as a classification issue and adopt a multi-attribute classification loss to train the CNN model. In the second stage, two LSTM networks are trained under the constraint of identities and dynamic appearance attributes. Therefore, the two-stage network provides a spatial–temporal feature extractor for pedestrians in video sequences. In the testing phase, a spatial–temporal representation can be obtained by inputting a sequence of images to the proposed TSAC-Net. We demonstrate the performance improvement gained with the use of attributes on several challenging person re-identification datasets (PRID2011, iLIDS-VID, MARS, and VIPeR). Moreover, the extensive experiments show that our approach achieves state-of-the-art results on three video-based benchmark datasets.
Keywords