IET Computer Vision (Apr 2023)

A multi‐scale feature representation and interaction network for underwater object detection

  • Jiaojiao Yuan,
  • Yongli Hu,
  • Yanfeng Sun,
  • Baocai Yin

DOI
https://doi.org/10.1049/cvi2.12161
Journal volume & issue
Vol. 17, no. 3
pp. 265 – 281

Abstract

Read online

Abstract Compared with natural images, underwater images are usually degraded with blur, scale variation, colour shift and texture distortion, which bring much challenge for computer vision tasks like object detection. In this case, generic object detection methods usually fail to achieve satisfactory performance. The main reason is considered that the current methods lack sufficient discriminativeness of feature representation for the degraded underwater images. A a novel multi‐scale feature representation and interaction network for underwater object detection is proposed, in which two core modules are elaborately designed to enhance the discriminativeness of feature representation for underwater images. The first is the Context Integration Module, which extracts rich context information from high‐level features and is integrated with the feature pyramid network to enhance the feature representation in a multi‐scale way. The second is the Dual‐refined Attention Interaction Module, which further enhances the feature representation by sufficient interactions between different levels of features both in channel and spatial domains based on attention mechanism. The proposed model is evaluated on four public underwater datasets. The experimental results compared with state‐of‐the‐art object detection methods show that the proposed model has leading performance, which verifies that it is effective for underwater object detection. In addition, object detection experiments on a foggy dataset of Real‐world Task‐driven Testing Set (RTTS) and the natural image dataset of pattern analysis statistical modelling and computational learning, visual object classes (PASCAL VOC) are conducted. The results show that the proposed model can be applied on the degraded dataset of RTTS but fails on PASCAL VOC.

Keywords