DAKRS: Domain Adaptive Knowledge-Based Retrieval System for Natural Language-Based Vehicle Retrieval

Synh Viet-Uyen Ha; Huy Dinh-Anh Le; Quang Qui-Vinh Nguyen; Nhat Minh Chung

doi:10.1109/ACCESS.2023.3260149

IEEE Access (Jan 2023)

DAKRS: Domain Adaptive Knowledge-Based Retrieval System for Natural Language-Based Vehicle Retrieval

Synh Viet-Uyen Ha,
Huy Dinh-Anh Le,
Quang Qui-Vinh Nguyen,
Nhat Minh Chung

Affiliations

Synh Viet-Uyen Ha: ORCiD; Vietnam National University—Ho Chi Minh City International University (VNU-HCMIU), Ho Chi Minh City, Vietnam
Huy Dinh-Anh Le: Vietnam National University—Ho Chi Minh City International University (VNU-HCMIU), Ho Chi Minh City, Vietnam
Quang Qui-Vinh Nguyen: Vietnam National University—Ho Chi Minh City International University (VNU-HCMIU), Ho Chi Minh City, Vietnam
Nhat Minh Chung: ORCiD; Vietnam National University—Ho Chi Minh City International University (VNU-HCMIU), Ho Chi Minh City, Vietnam

DOI: https://doi.org/10.1109/ACCESS.2023.3260149
Journal volume & issue: Vol. 11
pp. 90951 – 90965

Abstract

Read online

Given Natural Language (NL) text descriptions, NL-based vehicle retrieval aims to extract target vehicles from a multi-view multi-camera traffic video pool. Solutions to the problem have been challenged by not only inherent distinctions between textual and visual domains, but also by the complexities of the high-dimensionality of visual data, the diverse range of textual descriptions, a major lack of high-volume datasets in this relatively new field, alongside prominently large domain gaps between training and test sets. To deal with these issues, existing approaches have advocated computationally expensive models to separately extract the subspaces of language and vision before blending them into the same shared representation space. Through our proposed Domain Adaptive Knowledge-based Retrieval System (DAKRS), we show that by taking advantage of multi-modal information in a pretrained model, we can better focus on training robust representations in the shared space of limited labels, rather than on robust extraction of uni-modal representations that comes with increased computational burdens. Our contributions are threefold: (i) An efficient extension of Contrastive Language-Image Pre-training (CLIP)’s transfer learning into a baseline text-to-image multi-modular vehicle retrieval framework; (ii) A data enhancement method to create pseudo-vehicle tracks from the traffic video pool by leveraging the robustness of baseline retrieval model combined with background subtraction; and (iii) A Semi-Supervised Domain Adaptation (SSDA) scheme to engineer pseudo-labels for adapting model parameters to the target domain. Experimental results are benchmarked on Cityflow-NL to obtain 63.20% MRR with 150.0 M of parameters, illustrating our competitive effectiveness and efficiency against state-of-the-arts, without ensembling.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords