IEEE Access (Jan 2019)
View-Invariant and Similarity Learning for Robust Person Re-Identification
Abstract
Person re-identification aims to identify pedestrians across non-overlapping camera views. Deep learning methods have been successfully applied to solving the problem and have achieved impressive results. However, these methods rely either on feature extraction or metric learning alone ignoring the joint benefit and mutual complementary effects of the person view-specific representation. In this paper, we propose a multi-view deep network architecture coupled with n-pair loss (JNPL) to eliminate the complex view discrepancy and learn nonlinear mapping functions that are view-invariant. We show that the problem of the large variation in viewpoints of a pedestrian can be well solved using a multi-view network. We simultaneously exploit the complementary representation shared between views and propose an adaptive similarity loss function to better learn a similarity metric. In detail, we first extract view-invariant feature representation from n-pair of images using multi-stream CNN and then aggregate these features for predictions. Given n-positive pairs and a negative example, the network aggregate the feature map of the n-positive pairs and predicts the identity of the person and at the same time learns features that discriminate positive pairs against the negative sample. Extensive evaluations on three large scale datasets demonstrate the substantial advantages of our method over existing state-of-art methods.
Keywords