Applied Sciences (Feb 2025)
An Improved Small Target Segmentation Model Based on Mask Dino
Abstract
To address the issue of low segmentation accuracy for small objects in the Mask Dino segmentation method, we propose an improved small object segmentation model called FFMask Dino. Initially, we introduce scaled cosine attention and the log-cpb method into the Swin Transformer backbone network. Subsequently, by adjusting the network structure, we enhance the feature extraction process, which helps the model maintain generalization across different datasets and reduces the risk of overfitting. Lastly, we propose the FFPN module to optimize the pathways for feature fusion and transmission. The improved FPN reduces unnecessary computations, accelerates model inference speed, and integrates multi-scale feature details and high-level semantic information to complement object features, thereby enhancing model segmentation accuracy. Experimental results demonstrate that the improved segmentation model achieves a mean Intersection over Union (mIoU) of 42.15% on the ADE20K dataset for semantic segmentation tasks, representing a 0.96% increase compared to the Mask Dino method. On the CoCo dataset for instance segmentation tasks, with the Swin Transformer backbone, the Mask AP and Box AP are 47.10 and 52.60, respectively, showing improvements of 1% and 1.3% over the Mask Dino method. With the ResNet-50 backbone, the Mask AP and Box AP are 40.00 and 44.10, respectively, with improvements of 0.5% and 0.9% over the Mask Dino method. For the CoCo dataset’s panoptic segmentation tasks, with the Swin Transformer backbone, the PQ is 54.95, showing a 0.4% increase over the Mask Dino method. With the ResNet-50 backbone, the PQ is 46.93, showing a 0.9% increase over the Mask Dino method. These results effectively demonstrate the improved accuracy and precision of Mask Dino in segmenting small objects across various segmentation tasks.
Keywords