AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification

Shuze Geng; Qiudong Yu; Haowei Wang; Ziyi Song

doi:10.1038/s41598-024-76781-4

Scientific Reports (Nov 2024)

AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification

Shuze Geng,
Qiudong Yu,
Haowei Wang,
Ziyi Song

Affiliations

Shuze Geng: School of Information Technology and Engineering, Tianjin University of Technology and Education
Qiudong Yu: School of Information Technology and Engineering, Tianjin University of Technology and Education
Haowei Wang: School of Artificial Intelligence, Hebei University of Technology
Ziyi Song: School of Artificial Intelligence, Hebei University of Technology

DOI: https://doi.org/10.1038/s41598-024-76781-4
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 21

Abstract

Read online

Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords