Applying Object Detection and Embedding Techniques to One-Shot Class-Incremental Multi-Label Image Classification

Youngki Park; Youhyun Shin

doi:10.3390/app131810468

Applied Sciences (Sep 2023)

Applying Object Detection and Embedding Techniques to One-Shot Class-Incremental Multi-Label Image Classification

Youngki Park,
Youhyun Shin

Affiliations

Youngki Park: Department of Computer Education, Chuncheon National University of Education, Chuncheon 24328, Republic of Korea
Youhyun Shin: Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea

DOI: https://doi.org/10.3390/app131810468
Journal volume & issue: Vol. 13, no. 18
p. 10468

Abstract

Read online

In this paper, we introduce an efficient approach to multi-label image classification that is particularly suited for scenarios requiring rapid adaptation to new classes with minimal training data. Unlike conventional methods that rely solely on neural networks trained on known classes, our model integrates object detection and embedding techniques to allow for the fast and accurate classification of novel classes based on as few as one sample image. During training, we use either Convolutional Neural Network (CNN)- or Vision Transformer-based algorithms to convert the provided sample images of new classes into feature vectors. At inference, a multi-object image is analyzed using low-threshold object detection algorithms, such as YOLOS or CutLER, identifying virtually all object-containing regions. These regions are subsequently converted into candidate vectors using embedding techniques. The k-nearest neighbors are identified for each candidate vector, and labels are assigned accordingly. Our empirical evaluation, using custom multi-label datasets featuring random objects and backgrounds, reveals that our approach substantially outperforms traditional methods lacking object detection. Notably, unsupervised object detection exhibited higher speed and accuracy than its supervised counterpart. Furthermore, lightweight CNN-based embeddings were found to be both faster and more accurate than Vision Transformer-based methods. Our approach holds significant promise for applications where classes are either rarely represented or continuously evolving.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords