IEEE Access (Jan 2018)

Adaptive Triplet Model for Fine-Grained Visual Categorization

  • Jingyun Liang,
  • Jinlin Guo,
  • Yanming Guo,
  • Songyang Lao

DOI
https://doi.org/10.1109/ACCESS.2018.2884695
Journal volume & issue
Vol. 6
pp. 76776 – 76786

Abstract

Read online

Fine-grained visual categorization aims at differentiating subcategories, such as different species of birds, models of cars, and variants of aircraft. It often suffers from small inter-variance and large intra-variance. To keep dissimilar images far apart and preserve large intra-variance simultaneously, we propose an adaptive triplet model. At first, images are batched as triplets and input to a general convolutional network, which extracts convolutional image features. Then, we combine adaptive triplet loss and classification loss for multi-task training. Adaptive triplet loss pulls the same-class embeddings together and pushes examples from different subcategories apart. It allocates different weights to hard and easy examples in an adaptive way in the training process. Unlike previous hard mining mechanisms that discard all non-hard triplets, it can benefit from all possible informative examples. Moreover, a second-order distance function is put forward to capture local pairwise interactions of embeddings, which is more discriminative in distance measure. Classification loss is used to provide more direct supervision for training embeddings with category specific concepts. Furthermore, it makes the prediction of category more convenient and more efficient in testing. Experiments demonstrate the state-of-the-art results on three popular fine-grained datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft. In addition, our network structure is relatively simple compared with previous methods, which often suffer from multiple sub-networks and complex training mechanisms. It is also applicable for most up-to-date backbone networks, while others might be restricted to specific convolutional networks.

Keywords