IEEE Access (Jan 2024)
EGA-Net: Edge Guided Attention Network With Label Refinement for Parsing of Animal Body Parts
Abstract
In computer vision, semantic segmentation precisely delineates objects at the pixel level. This fundamental idea is constantly evolving by adding new modules and adjustments to suit the unique characteristics of different object classes. Pixel-level semantic segmentation is an intricate and computationally intensive task, especially within the context of part-based approaches. The study proposes a transformer-based attention network that is edge-guided and developed for the precise partitioning of different parts of quadruped animals. The process of labeling masks at the pixel level is a challenging task for various object categories, owing to its inherent complexity, which often results in inaccurate annotations. An additional mechanism is used to enhance pixel-level accuracy between classes, which iteratively refines labels. The model is evaluated using the PascalPart and PartImageNet datasets, using various scales of transformer architectures. Performance is evaluated using metrics such as mean Intersection-over-Union (mIoU), Pixel Accuracy (PA), and mean Accuracy (mA). Ablation studies are conducted to evaluate the model’s performance based on network parameters, while the effectiveness of each component is assessed using Class Activation Maps (CAM). The results show a notable 8% improvement in mIoU scores over existing state-of-the-art architectures, indicating the effectiveness of the proposed model in achieving fine-grained part segmentation, particularly in the context of quadruped animals. The open source code is available at https://github.com/abhigoku10/EGA-Net.
Keywords