A multi‐scale feature representation and interaction network for underwater object detection

Jiaojiao Yuan; Yongli Hu; Yanfeng Sun; Baocai Yin

doi:10.1049/cvi2.12161

IET Computer Vision (Apr 2023)

A multi‐scale feature representation and interaction network for underwater object detection

Jiaojiao Yuan,
Yongli Hu,
Yanfeng Sun,
Baocai Yin

Affiliations

Jiaojiao Yuan: Beijing Key Laboratory of Multimedia and Intelligent Software Technology Beijing Institute of Artificial Intelligence Faculty of Information Technology Beijing University of Technology Beijing China
Yongli Hu: Beijing Key Laboratory of Multimedia and Intelligent Software Technology Beijing Institute of Artificial Intelligence Faculty of Information Technology Beijing University of Technology Beijing China
Yanfeng Sun: Beijing Key Laboratory of Multimedia and Intelligent Software Technology Beijing Institute of Artificial Intelligence Faculty of Information Technology Beijing University of Technology Beijing China
Baocai Yin: Beijing Key Laboratory of Multimedia and Intelligent Software Technology Beijing Institute of Artificial Intelligence Faculty of Information Technology Beijing University of Technology Beijing China

DOI: https://doi.org/10.1049/cvi2.12161
Journal volume & issue: Vol. 17, no. 3
pp. 265 – 281

Abstract

Read online

Abstract Compared with natural images, underwater images are usually degraded with blur, scale variation, colour shift and texture distortion, which bring much challenge for computer vision tasks like object detection. In this case, generic object detection methods usually fail to achieve satisfactory performance. The main reason is considered that the current methods lack sufficient discriminativeness of feature representation for the degraded underwater images. A a novel multi‐scale feature representation and interaction network for underwater object detection is proposed, in which two core modules are elaborately designed to enhance the discriminativeness of feature representation for underwater images. The first is the Context Integration Module, which extracts rich context information from high‐level features and is integrated with the feature pyramid network to enhance the feature representation in a multi‐scale way. The second is the Dual‐refined Attention Interaction Module, which further enhances the feature representation by sufficient interactions between different levels of features both in channel and spatial domains based on attention mechanism. The proposed model is evaluated on four public underwater datasets. The experimental results compared with state‐of‐the‐art object detection methods show that the proposed model has leading performance, which verifies that it is effective for underwater object detection. In addition, object detection experiments on a foggy dataset of Real‐world Task‐driven Testing Set (RTTS) and the natural image dataset of pattern analysis statistical modelling and computational learning, visual object classes (PASCAL VOC) are conducted. The results show that the proposed model can be applied on the degraded dataset of RTTS but fails on PASCAL VOC.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords