Self-attention and long-range relationship capture network for underwater object detection

Ziran Gao; Yanli Shi; Sha Li

Journal of King Saud University: Computer and Information Sciences (Feb 2024)

Self-attention and long-range relationship capture network for underwater object detection

Ziran Gao,
Yanli Shi,
Sha Li

Affiliations

Ziran Gao: School of Science, Jilin Institute of Chemical Technology, Jilin City 132022, China
Yanli Shi: College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin City 132022, China; Correspondence to: School of Science, Jilin Institute of Chemical Technology, Jilin City 132022, China.
Sha Li: Qingdao University of Science and Technology, Qingdao City 266061, China

Journal volume & issue: Vol. 36, no. 2
p. 101971

Abstract

Read online

Underwater object detection has been shown to exhibit significant potential for exploring underwater environments. However, underwater datasets often suffer from degeneration due to uneven underwater light distribution, complex underwater environment, and crowded underwater dynamic background. Thus, object detection performance would be degraded accordingly. In this paper, a large kernel convolutional object detection network based on self-attention and long-range relationship capture is proposed. Firstly, a hybrid dilated large kernel attention mechanism is proposed, which adopts the idea of hybrid dilated convolution and combines the advantages of large kernel attention mechanism and self-attention. This attention mechanism can avoid self-attention defects while achieving self-attention adaptiveness and long-range relevance. Secondly, a feature enhancement block called residual reconstructed module is proposed, which captures long-range dependencies in the network and extracts more critical contextual information, thus solving the problem of network degradation and accuracy degradation. Thirdly, an adaptive spatial feature fusion object detection head is constructed, which can directly learn how to filter different features at different feature layers spatially; useless information is filtered out, and only useful information is kept for combination to enhance the detection capability of the network further. Finally, network for underwater object detection is proposed based on the above three techniques. Extensive experiments were conducted on the well-known datasets of RUOD, Aquarium, URPC, and MS COCO. Compared to the prior state-of-the-art methods, the experimental findings demonstrate that the proposed approach obtains the highest mAP of 88.7%, 86.5%, 98.9%, and 71.4%, respectively. This represents an improvement of 1.2, 1.5, 8.5, and 0.2 percentage, in that order. The proposed model shows the capacity to function by applying self-attention to local details, as well as the capacity to grasp global long-range relationships, prioritize essential data, and spatially filter irrelevant information.

Published in Journal of King Saud University: Computer and Information Sciences

ISSN: 1319-1578 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-computer-and-information-sciences/

About the journal

Abstract

Keywords