A dual‐modal graph attention interaction network for person Re‐identification

Wen Wang; Gaoyun An; Qiuqi Ruan

doi:10.1049/cvi2.12192

IET Computer Vision (Sep 2023)

A dual‐modal graph attention interaction network for person Re‐identification

Wen Wang,
Gaoyun An,
Qiuqi Ruan

Affiliations

Wen Wang: Institute of Information Science Beijing Jiaotong University Beijing China
Gaoyun An: Institute of Information Science Beijing Jiaotong University Beijing China
Qiuqi Ruan: Institute of Information Science Beijing Jiaotong University Beijing China

DOI: https://doi.org/10.1049/cvi2.12192
Journal volume & issue: Vol. 17, no. 6
pp. 687 – 699

Abstract

Read online

Abstract Person Re‐identification (Re‐ID) is a task of matching target pedestrians under cross‐camera surveillance. Learning discriminative feature representations is the main issue for person Re‐ID. A few recent methods introduce text descriptions as auxiliary information to enhance feature representations, as it offers richer semantic information and perspective consistency. However, these works usually process text and images separately, which leads to the absence of cross‐modal interactions. In this article, a Dual‐modal Graph Attention Interaction Network (Dual‐GAIN) is proposed to integrate visual features and textual features into a heterogeneous graph to model the relationship between them, simultaneously. The proposed Dual‐GAIN mainly consists of two components: a dual‐stream feature extractor and a Graph Attention Interaction Network (GAIN). Specifically, the two‐stream feature extractor is utilised to extract visual features and textual features respectively. Then, visual local features and textual features are treated as nodes to construct a multi‐modal graph. Cosine similarity constrained attention weights are introduced in GAIN, which is designed for cross‐modal interaction and feature fusion on this heterogeneous multi‐modal graph. Experiments on public large‐scale datasets, that is, Market‐1501, CUHK03 labelled, and CUHK03 detected, demonstrate our method achieves the state‐of‐the‐art performance.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords