Attribute‐guided transformer for robust person re‐identification

Zhe Wang; Jun Wang; Junliang Xing

doi:10.1049/cvi2.12215

IET Computer Vision (Dec 2023)

Attribute‐guided transformer for robust person re‐identification

Zhe Wang,
Jun Wang,
Junliang Xing

Affiliations

Zhe Wang: School of Electronics and Information Engineering Beihang University Beijing China
Jun Wang: School of Electronics and Information Engineering Beihang University Beijing China
Junliang Xing: Department of Computer Science and Technology Tsinghua University Beijing China

DOI: https://doi.org/10.1049/cvi2.12215
Journal volume & issue: Vol. 17, no. 8
pp. 977 – 992

Abstract

Read online

Abstract Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re‐identification (Re‐ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off‐the‐shelf models, which may be sub‐optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute‐guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute‐aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state‐of‐the‐art methods on several challenging benchmarks without any bells and whistles.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords