IEEE Access (Jan 2024)
AF-CPACNet: AnchorFree Crowd Parsing Attention-Based Characteristic Segmentation Network
Abstract
Multi-human parsing involves the task of segmenting and identifying different human parts within images that contain multiple people. It is a crucial task in computer vision, particularly for applications such as human pose estimation, scene understanding, and virtual reality. This paper explores the various features and techniques used in multi-human parsing, including the use of deep learning models like convolutional neural networks (CNNs) and attention mechanisms to accurately detect and segment human body parts in crowded or complex environments. Anchor boxes often fail to capture the diverse variations in human body shapes and poses accurately, leading to suboptimal performance in human parsing tasks. To address these limitations, we introduce AF-CPACNet, a novel model that eliminates the need for anchor boxes by adopting a multi-head and multi-task architecture. AF-CPACNet consists of two key components: a detection head and an edge-guided parsing module, enabling pixel-level analysis and improving the precision of human body part segmentation. Additionally, a refinement head is incorporated to further enhance semantic parsing quality. The model captures finer details of human body parts by considering color, size, and pattern attributes in a single forward pass while operating in real-time. A specialized loss function is employed to optimize semantic parsing results and improve training efficiency. We evaluate the performance of AF-CPACNet on multiple human parsing datasets, including CCIHP and CIHP, and demonstrate that it significantly outperforms existing state-of-the-art methods. Specifically, AF-CPACNet achieves an 11% improvement on the CIHP dataset and an mIoU of 67.3 on the CCIHP dataset, across both global and instance-level metrics. The open-source code is available at https://github.com/abhigoku10/AF-CPACNet.git.
Keywords