IEEE Access (Jan 2024)

An Efficient YOLO Network With CSPCBAM, Ghost, and Cluster-NMS for Underwater Target Detection

  • Zheng Zhang,
  • Qingshan Tong,
  • Xiaofei Huang

DOI
https://doi.org/10.1109/ACCESS.2024.3368878
Journal volume & issue
Vol. 12
pp. 30562 – 30576

Abstract

Read online

In recent years, owing to the rapid advancements in deep learning, advanced object detection methods, such as You Only Look Once (YOLO) and Efficient Detector (EfficientDet), have been frequently used to detect underwater organisms. However, due to the complexity of underwater scenarios and deployment limitations, these models often encounter various challenges, such as blurred targets, occlusions, and high model computing costs. On this basis, we propose a YOLO network (CGC-YOLO) based on Cross-Stage Partial Convolutional Block Attention Module (CSPCBAM), Ghost module, and cluster non-maximum suppression (Cluster-NMS). Firstly, CSPCBAM enhances the model’s ability to extract intricate features by amplifying pertinent feature information across both channel and spatial dimensions. This augmentation contributes to an improved detection performance of the model, especially when dealing with fuzzy targets. Secondly, the Ghost module is employed to optimize the model’s efficiency by decreasing its parameters and reducing the computational load in terms of floating-point operations per second (FLOPs). Finally, by introducing Cluster-NMS and Score Penalty Mechanism (SPM) to reweight the confidence of bounding boxes, the model can retain the real object with occlusion. The experimental results show that on the Underwater Robot Picking Competition 2020 (URPC 2020) and brackish water dataset, the [email protected] of our proposed CGC-YOLO reaches 87.2% and 98.6% respectively, which is at least 1 percentage point higher than all other models. The CGC-YOLO has 14.8 FLOPs and speeds of 7.1ms and 6.3ms, respectively, which is also better than all other models. Ablation experiments and qualitative analysis show that CGC-YOLO can deal with fuzzy and obscured objects well, with lower computational cost and faster inference speed.

Keywords