Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification

Yongkang Ding; Jiechen Li; Hao Wang; Ziang Liu; Anqi Wang

doi:10.1007/s40747-024-01646-2

Complex & Intelligent Systems (Nov 2024)

Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification

Yongkang Ding,
Jiechen Li,
Hao Wang,
Ziang Liu,
Anqi Wang

Affiliations

Yongkang Ding: College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Jiechen Li: Engineering in Electrical Engineering, University of southern California
Hao Wang: School of Computer Science, Carnegie Mellon University
Ziang Liu: Electrical and Computer Engineering, Carnegie Mellon University
Anqi Wang: College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics

DOI: https://doi.org/10.1007/s40747-024-01646-2
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords