Journal of King Saud University: Computer and Information Sciences (Jul 2025)

GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping

  • Chenglong Zhou,
  • Damin Zhang,
  • Qing He,
  • MingFang Li,
  • MingRong Li,
  • Xiaobo Zhou

DOI
https://doi.org/10.1007/s44443-025-00120-4
Journal volume & issue
Vol. 37, no. 5
pp. 1 – 13

Abstract

Read online

Abstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Existing methods mainly focus on learning prominent visual patterns, often neglecting other potential features, which makes it difficult for the model to fully distinguish subtle differences in both global and local features of objects, thus limiting the performance of FGIR tasks. This work proposes a Global–Local Enhanced Module (GLEM) to integrate global and local features to address these issues effectively. GLEM is based on channel-aware attention mechanisms and explores new feature details through adaptive erasure and dynamic fusion strategies, preventing the model from overly focusing on prominent regions. At the same time, GLEM utilizes multi-view cropping techniques to capture subtle differences between global and local features effectively. We conduct extensive experiments on three FGIR benchmark datasets, and the results demonstrate that the proposed GLEM method achieves state-of-the-art performance.

Keywords