Entropy (May 2024)

AM-MSFF: A Pest Recognition Network Based on Attention Mechanism and Multi-Scale Feature Fusion

  • Meng Zhang,
  • Wenzhong Yang,
  • Danny Chen,
  • Chenghao Fu,
  • Fuyuan Wei

DOI
https://doi.org/10.3390/e26050431
Journal volume & issue
Vol. 26, no. 5
p. 431

Abstract

Read online

Traditional methods for pest recognition have certain limitations in addressing the challenges posed by diverse pest species, varying sizes, diverse morphologies, and complex field backgrounds, resulting in a lower recognition accuracy. To overcome these limitations, this paper proposes a novel pest recognition method based on attention mechanism and multi-scale feature fusion (AM-MSFF). By combining the advantages of attention mechanism and multi-scale feature fusion, this method significantly improves the accuracy of pest recognition. Firstly, we introduce the relation-aware global attention (RGA) module to adaptively adjust the feature weights of each position, thereby focusing more on the regions relevant to pests and reducing the background interference. Then, we propose the multi-scale feature fusion (MSFF) module to fuse feature maps from different scales, which better captures the subtle differences and the overall shape features in pest images. Moreover, we introduce generalized-mean pooling (GeMP) to more accurately extract feature information from pest images and better distinguish different pest categories. In terms of the loss function, this study proposes an improved focal loss (FL), known as balanced focal loss (BFL), as a replacement for cross-entropy loss. This improvement aims to address the common issue of class imbalance in pest datasets, thereby enhancing the recognition accuracy of pest identification models. To evaluate the performance of the AM-MSFF model, we conduct experiments on two publicly available pest datasets (IP102 and D0). Extensive experiments demonstrate that our proposed AM-MSFF outperforms most state-of-the-art methods. On the IP102 dataset, the accuracy reaches 72.64%, while on the D0 dataset, it reaches 99.05%.

Keywords