IEEE Access (Jan 2022)
Feature Fusion and Center Aggregation for Visible-Infrared Person Re-Identification
Abstract
The visible-infrared pedestrian re- identification (VI Re-ID) task aims to match cross-modality pedestrian images with the same labels. Most current methods focus on mitigating the modality discrepancy by adopting a two-stream network and identity supervision. Based on current methods, we propose a novel feature fusion and center aggregation learning network ( $F^{2}$ CALNet) for cross-modality pedestrian re- identification. $F^{2}$ CALNet focuses on learning modality-irrelevant features by simultaneously reducing inter-modality discrepancies and increasing the inter-identity variations in a single framework. Specifically, we first adopt a two-stream backbone network to extract modality-independent and modality-shared information. Then, we embed modality mitigation modules in a two-stream network to learn feature maps that are stripped of the modality information. Finally, we devise a feature fusion and center aggregation learning module, which first merges two different granularity features to learn distinguishing features, then, we organize two kinds of center-based loss functions to reduce the intra-identity inter- and intra-modality differences and increase inter-identity variations by simultaneously pulling the features of the same identity close to their centers and pushing far away the centers of different identities. Extensive experiments on two public cross-modality datasets (SYSU-MM01 and RegDB) show that $F^{2}$ CALNet is superior to the state-of-the-art approaches. Furthermore, on the SYSU-MM01 datasets, our model outperforms the baseline by 5.52% and 4.25% for the accuracy of rank1 and mAP, respectively.
Keywords