Applied Sciences (Nov 2024)

Multimodal Framework for Long-Tailed Recognition

  • Jian Chen,
  • Jianyin Zhao,
  • Jiaojiao Gu,
  • Yufeng Qin,
  • Hong Ji

DOI
https://doi.org/10.3390/app142210572
Journal volume & issue
Vol. 14, no. 22
p. 10572

Abstract

Read online

Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used for visual-semantic contrastive learning to obtain good features, while in the second stage, class-balanced data are used for classifier training. The proposed framework leverages the advantages of multimodal models and mitigates the problem of class imbalance in long-tailed data recognition. Experimental results demonstrate that the proposed framework achieves competitive performance on the CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018 datasets for image classification.

Keywords