Machine Learning with Applications (Jun 2022)

Zero-shot image classification using coupled dictionary embedding

  • Mohammad Rostami,
  • Soheil Kolouri,
  • Zak Murez,
  • Yuri Owechko,
  • Eric Eaton,
  • Kuyngnam Kim

Journal volume & issue
Vol. 8
p. 100278

Abstract

Read online

Zero-shot learning (ZSL) is a framework to classify images that belong to unseen visual classes using their semantic descriptions about the unseen classes. We develop a new ZSL algorithm based on coupled dictionary learning. The core idea is to enforce the visual features and the semantic attributes of an image to share the same sparse representation in an intermediate embedding space, modeled as the shared input space of two sparsifying dictionaries. In the ZSL training stage, we use images from a number of seen classes for which we have access to both the visual and the semantic attributes to train two coupled dictionaries that can represent both the visual and the semantic feature vectors of an image using a single sparse vector. In the ZSL testing stage and in the absence of labeled data, images from unseen classes are mapped into the attribute space by finding the joint-sparse representations using solely the visual dictionary via solving a LASSO problem. The image is then classified in the attribute space given semantic descriptions of unseen classes. We also provide attribute-aware and transductive formulations to tackle the “domain-shift” and the “hubness” challenges for ZSL, respectively. Experiments on four primary datasets using VGG19 and GoogleNet visual features, are provided. Our performances using VGG19 features are 91.0%, 48.4%, and 89.3% on the SUN, the CUB, and the AwA1 datasets, respectively. Our performances on the SUN, the CUB, and the AwA2 datasets are 57.0%,49.7%, and 71.7%, respectively, when GoogleNet features are used. Comparison with existing methods demonstrates that our method is effective and compares favorably against the state-of-the-art. In particular, our algorithm leads to decent performance on the all four datasets.22 Early partial results of this paper is presented at 2018 AAAI (Kolouri, Rostami, Owechko, & Kim, 2018).

Keywords