Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion

Minghua Zhang; Shubo Xu; Wei Song; Qi He; Quanmiao Wei

doi:10.3390/rs13224706

Remote Sensing (Nov 2021)

Lightweight Underwater Object Detection Based on YOLO v4 and Multi-Scale Attentional Feature Fusion

Minghua Zhang,
Shubo Xu,
Wei Song,
Qi He,
Quanmiao Wei

Affiliations

Minghua Zhang: College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
Shubo Xu: College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
Wei Song: College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
Qi He: College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
Quanmiao Wei: East China Sea Bureau, Ministry of Natural Resources, Shanghai 200137, China

DOI: https://doi.org/10.3390/rs13224706
Journal volume & issue: Vol. 13, no. 22
p. 4706

Abstract

Read online

A challenging and attractive task in computer vision is underwater object detection. Although object detection techniques have achieved good performance in general datasets, problems of low visibility and color bias in the complex underwater environment have led to generally poor image quality; besides this, problems with small targets and target aggregation have led to less extractable information, which makes it difficult to achieve satisfactory results. In past research of underwater object detection based on deep learning, most studies have mainly focused on improving detection accuracy by using large networks; the problem of marine underwater lightweight object detection has rarely gotten attention, which has resulted in a large model size and slow detection speed; as such the application of object detection technologies under marine environments needs better real-time and lightweight performance. In view of this, a lightweight underwater object detection method based on the MobileNet v2, You Only Look Once (YOLO) v4 algorithm and attentional feature fusion has been proposed to address this problem, to produce a harmonious balance between accuracy and speediness for target detection in marine environments. In our work, a combination of MobileNet v2 and depth-wise separable convolution is proposed to reduce the number of model parameters and the size of the model. The Modified Attentional Feature Fusion (AFFM) module aims to better fuse semantic and scale-inconsistent features and to improve accuracy. Experiments indicate that the proposed method obtained a mean average precision (mAP) of 81.67% and 92.65% on the PASCAL VOC dataset and the brackish dataset, respectively, and reached a processing speed of 44.22 frame per second (FPS) on the brackish dataset. Moreover, the number of model parameters and the model size were compressed to 16.76% and 19.53% of YOLO v4, respectively, which achieved a good tradeoff between time and accuracy for underwater object detection.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords