EFECL: Feature encoding enhancement with contrastive learning for indoor 3D object detection

Yao Duan; Renjiao Yi; Yuanming Gao; Kai Xu; Chenyang Zhu

doi:10.1007/s41095-023-0366-0

Computational Visual Media (Aug 2023)

EFECL: Feature encoding enhancement with contrastive learning for indoor 3D object detection

Yao Duan,
Renjiao Yi,
Yuanming Gao,
Kai Xu,
Chenyang Zhu

Affiliations

Yao Duan: School of Computing, National University of Defense Technology
Renjiao Yi: School of Computing, National University of Defense Technology
Yuanming Gao: School of Computing, National University of Defense Technology
Kai Xu: School of Computing, National University of Defense Technology
Chenyang Zhu: School of Computing, National University of Defense Technology

DOI: https://doi.org/10.1007/s41095-023-0366-0
Journal volume & issue: Vol. 9, no. 4
pp. 875 – 892

Abstract

Read online

Abstract Good proposal initials are critical for 3D object detection applications. However, due to the significant geometry variation of indoor scenes, incomplete and noisy proposals are inevitable in most cases. Mining feature information among these “bad” proposals may mislead the detection. Contrastive learning provides a feasible way for representing proposals, which can align complete and incomplete/noisy proposals in feature space. The aligned feature space can help us build robust 3D representation even if bad proposals are given. Therefore, we devise a new contrast learning framework for indoor 3D object detection, called EFECL, that learns robust 3D representations by contrastive learning of proposals on two different levels. Specifically, we optimize both instance-level and category-level contrasts to align features by capturing instance-specific characteristics and semantic-aware common patterns. Furthermore, we propose an enhanced feature aggregation module to extract more general and informative features for contrastive learning. Evaluations on ScanNet V2 and SUN RGB-D benchmarks demonstrate the generalizability and effectiveness of our method, and our method can achieve 12.3% and 7.3% improvements on both datasets over the benchmark alternatives. The code and models are publicly available at https://github.com/YaraDuan/EFECL .

Published in Computational Visual Media

ISSN: 2096-0433 (Print); 2096-0662 (Online)
Publisher: SpringerOpen
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.springer.com/41095

About the journal

Abstract

Keywords