Electronics Letters (Oct 2023)
Multi‐level cross‐modality learning framework for text‐based person re‐identification
Abstract
Abstract The target of text‐based person re‐identification (Re‐ID) is to retrieve the corresponding image of a person through the given text information. However, due to the homogeneous variety and modality heterogeneity, it is challenging to simultaneously learn both global‐level and local‐level cross‐modal features and align them in the same embedding space without additional networks. To address these problems, an effective multi‐level cross‐modality learning (MCL) framework for language and vision person Re‐ID is proposed. More specifically, a multi‐branch feature extraction (MFE) module is designed to comprehensively map both global and partial semantic information for the visual and textual embedding at the same time, capturing the intra‐class semantic relationships in multi‐granularities. Besides, a cross‐modal alignment (CA) module is devised to match the multi‐grained representations and reduce the inter‐class gap from global‐level to partial‐level. Extensive experiments conducted on the CUHK‐PEDES and ICFG‐PEDES datasets suggest that this method outperforms the state‐of‐the‐art models.
Keywords