IEEE Access (Jan 2024)

Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images

  • Quandeng Gou,
  • Yuheng Ren

DOI
https://doi.org/10.1109/ACCESS.2024.3433374
Journal volume & issue
Vol. 12
pp. 103049 – 103059

Abstract

Read online

With the vigorous development of digital creativity, the image data generated by it has exploded. To effectively manage massive image data, multi-level and multi-classification management of images has become very necessary. However, the existing hierarchical classification models of deep learning images are all based on convolutional neural networks, which have limitations in capturing the underlying global features. Different from this, Transformer, as a new neural network, captures the global context information through the attention mechanism, so it performs excellently in various visual recognition tasks. However, the existing work based on Transformer does not use the hierarchical structure information in the model, making it challenging to apply the model to multi-level and multi-classification tasks of images. Therefore, this paper proposes a new image multi-level and multi-classification model, which uses multi-scale CNN to effectively capture feature information at different scales and combines it with the Transformer’s ability to extract global features. At the same time, the model makes full use of the hierarchical structure information in Transformer to better understand the complex relationship of images. We have done a lot of experiments on three data sets, CIFAR-10, CIFAR-100, and CUB-200-2011, and compared the performance with the existing multi-level and multi-classification model of images. The results show that our model has higher classification accuracy and better robustness.

Keywords