IEEE Access (Jan 2024)
Feature Compensation Network for Prototype-Based Cross-Modal Person Re-Identification
Abstract
Cross-modality person re-identification is the process of matching person instances across different modalities, which poses a challenge in surveillance systems. These models encounter high intra-modality differences and a very high modality gap. In order to address the modality gap, numerous studies have employed generative models to produce image pairs as a means of augmenting the dataset. Nevertheless, the presence of artifacts in the generated images could potentially impact the model’s prediction. In order to tackle these problems, we adopt a two-stage network. In the initial stage, we will extract features from cross-modality images and employ adversarial learning to produce prototype features that encompass essential features from both the RGB and IR modalities. The prototype features are utilized to construct a compensating feature set, which is then employed to train the re-identification model. The prototype features are derived from the extracted features, ensuring that only key components are utilized in the generation process. In the second phase, we employ a combination of integral probability metrics to align the identities through discrimination learning. Subsequently, we map the modalities to diminish the gap between them. At this stage, we propose to use modality-specific loss functions and modality-shared loss functions, which ensure that features relating to each modality are preserved during training. In addition, rather than identifying point-to-point differences between the feature distributions, the study focuses on the process of transporting one distribution to another, which contributes to the incorporation of perceptual learning in the model. The extensive evaluation of the proposed model demonstrates enhanced re-identification outcomes, affirming the model’s ability to align modalities and augment the feature space.
Keywords