IEEE Access (Jan 2024)
Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
Abstract
Attribute-image person re-identification (AIPR) is a meaningful and challenging task to retrieve images based on attribute descriptions. In this paper, we propose a regularized dual modal meta metric learning (RDM3L) method for AIPR, which employs meta-learning training methods to enhance the transformer’s capacity to acquire latent knowledge. During training, data are initially divided into a single-modal support set with images and a dual-modal query set containing both attributes and images. The RDM3L method introduces an attribute-image transformer (AIT) as a novel feature extraction backbone, extending the visual transformer concept. Utilizing the concept of hard sample mining, the method designs attribute-image cross-modal meta metrics and image-image intra-modal meta metrics. The triple loss function based on meta-metrics is then applied to converge the same category samples and diverge different categories, thereby enhancing cross-modal and intramodal discrimination abilities. Finally, a regularization term is used to aggregate samples of different modalities in the query set to prevent overfitting, ensuring that RDM3L maintains the model’s generalization ability while aligning the two modalities and identifying unseen classes. Experimental results on the PETA and Market-1501 attribute datasets demonstrate the superiority of the RDM3L method, achieving mean average precision (mAP) scores of 36.7% on the Market-1501 Attributes dataset and 60.6% on the PETA dataset.
Keywords