Supervised Contrastive Learning for 3D Cross-Modal Retrieval

Yeon-Seung Choo; Boeun Kim; Hyun-Sik Kim; Yong-Suk Park

doi:10.3390/app142210322

Applied Sciences (Nov 2024)

Supervised Contrastive Learning for 3D Cross-Modal Retrieval

Yeon-Seung Choo,
Boeun Kim,
Hyun-Sik Kim,
Yong-Suk Park

Affiliations

Yeon-Seung Choo: Contents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of Korea
Boeun Kim: Artificial Intelligence Research Center, Korea Electronics Technology Institute (KETI), Seongnam 13509, Republic of Korea
Hyun-Sik Kim: Contents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of Korea
Yong-Suk Park: Contents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of Korea

DOI: https://doi.org/10.3390/app142210322
Journal volume & issue: Vol. 14, no. 22
p. 10322

Abstract

Read online

Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords