A fine‐grained image classification method based on information interaction

Shuo Zhu; Xukang Zhang; Yu Wang; Zongyang Wang; Jiahao Sun

doi:10.1049/ipr2.13295

IET Image Processing (Dec 2024)

A fine‐grained image classification method based on information interaction

Shuo Zhu,
Xukang Zhang,
Yu Wang,
Zongyang Wang,
Jiahao Sun

Affiliations

Shuo Zhu: Jiangsu Province Engineering Research Center of Photonic Devices and System Integration for Communication Sensing Convergence Wuxi University Wuxi China
Xukang Zhang: School of Electronic and Information Engineering Nanjing University of Information Science and Technology Nanjing China
Yu Wang: School of Electronic and Information Engineering Nanjing University of Information Science and Technology Nanjing China
Zongyang Wang: Wuxi Xiyuan Technology Co., Ltd. Wuxi Jiangsu China
Jiahao Sun: Jiangsu Province Engineering Research Center of Photonic Devices and System Integration for Communication Sensing Convergence Wuxi University Wuxi China

DOI: https://doi.org/10.1049/ipr2.13295
Journal volume & issue: Vol. 18, no. 14
pp. 4852 – 4861

Abstract

Read online

Abstract To enhance the accuracy of fine‐grained image classification and address challenges such as excessive interference factors within the dataset, inadequate extraction of local key features, and insufficient channel semantic association, a dual‐branch information interaction model that integrates convolutional neural networks (CNN) with Vision Transformers is proposed. This model leverages the Vision Transformer branch to extract global features, which are subsequently combined with the CNN branch to further augment the model's capability for local information extraction. In order to enhance the ability of the CNN branch to extract global information and reduce the loss of feature information, a feature enhancement module is added to the CNN branch. Since the Vision Transformer branch directly convolves with the convolution kernel will result in the inability to learn the underlying features of the image, a shallow feature extraction module is proposed, and the CNN and Vision Transformer branches interact with the information of the dual branches through the down‐sampling Down module and the up‐sampling UP module. The accuracy of the improved method on CUB‐200‐2011, Stanford Cars and FGVC‐Aircraft fine‐grained image classification datasets are 95.2%, 97.1% and 96.9%, respectively. The experimental results show that the method has good generalization on different datasets.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords