UWV-Yolox: A Deep Learning Model for Underwater Video Object Detection

Haixia Pan; Jiahua Lan; Hongqiang Wang; Yanan Li; Meng Zhang; Mojie Ma; Dongdong Zhang; Xiaoran Zhao

doi:10.3390/s23104859

Sensors (May 2023)

UWV-Yolox: A Deep Learning Model for Underwater Video Object Detection

Haixia Pan,
Jiahua Lan,
Hongqiang Wang,
Yanan Li,
Meng Zhang,
Mojie Ma,
Dongdong Zhang,
Xiaoran Zhao

Affiliations

Haixia Pan: School of Software, Beihang University, Beijing 100191, China
Jiahua Lan: School of Software, Beihang University, Beijing 100191, China
Hongqiang Wang: School of Software, Beihang University, Beijing 100191, China
Yanan Li: School of Software, Beihang University, Beijing 100191, China
Meng Zhang: School of Software, Beihang University, Beijing 100191, China
Mojie Ma: School of Software, Beihang University, Beijing 100191, China
Dongdong Zhang: School of Software, Beihang University, Beijing 100191, China
Xiaoran Zhao: School of Software, Beihang University, Beijing 100191, China

DOI: https://doi.org/10.3390/s23104859
Journal volume & issue: Vol. 23, no. 10
p. 4859

Abstract

Read online

Underwater video object detection is a challenging task due to the poor quality of underwater videos, including blurriness and low contrast. In recent years, Yolo series models have been widely applied to underwater video object detection. However, these models perform poorly for blurry and low-contrast underwater videos. Additionally, they fail to account for the contextual relationships between the frame-level results. To address these challenges, we propose a video object detection model named UWV-Yolox. First, the Contrast Limited Adaptive Histogram Equalization method is used to augment the underwater videos. Then, a new CSP_CA module is proposed by adding Coordinate Attention to the backbone of the model to augment the representations of objects of interest. Next, a new loss function is proposed, including regression and jitter loss. Finally, a frame-level optimization module is proposed to optimize the detection results by utilizing the relationship between neighboring frames in videos, improving the video detection performance. To evaluate the performance of our model, We construct experiments on the UVODD dataset built in the paper, and select [email protected] as the evaluation metric. The [email protected] of the UWV-Yolox model reaches 89.0%, which is 3.2% better than the original Yolox model. Furthermore, compared with other object detection models, the UWV-Yolox model has more stable predictions for objects, and our improvements can be flexibly applied to other models.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords