Frontiers in Marine Science (Sep 2023)

MSGNet: multi-source guidance network for fish segmentation in underwater videos

  • Peng Zhang,
  • Peng Zhang,
  • Peng Zhang,
  • Peng Zhang,
  • Hong Yu,
  • Hong Yu,
  • Hong Yu,
  • Hong Yu,
  • Haiqing Li,
  • Haiqing Li,
  • Haiqing Li,
  • Haiqing Li,
  • Xin Zhang,
  • Xin Zhang,
  • Xin Zhang,
  • Xin Zhang,
  • Sixue Wei,
  • Sixue Wei,
  • Sixue Wei,
  • Sixue Wei,
  • Wan Tu,
  • Wan Tu,
  • Wan Tu,
  • Wan Tu,
  • Zongyi Yang,
  • Zongyi Yang,
  • Zongyi Yang,
  • Zongyi Yang,
  • Junfeng Wu,
  • Junfeng Wu,
  • Junfeng Wu,
  • Junfeng Wu,
  • Yuanshan Lin,
  • Yuanshan Lin,
  • Yuanshan Lin,
  • Yuanshan Lin

DOI
https://doi.org/10.3389/fmars.2023.1256594
Journal volume & issue
Vol. 10

Abstract

Read online

Fish segmentation in underwater videos provides basic data for fish measurements, which is vital information that supports fish habitat monitoring and fishery resources survey. However, because of water turbidity and insufficient lighting, fish segmentation in underwater videos has low accuracy and poor robustness. Most previous work has utilized static fish appearance information while ignoring fish motion in underwater videos. Considering that motion contains more detail, this paper proposes a method that simultaneously combines appearance and motion information to guide fish segmentation in underwater videos. First, underwater videos are preprocessed to highlight fish in motion, and obtain high-quality underwater optical flow. Then, a multi-source guidance network (MSGNet) is presented to segment fish in complex underwater videos with degraded visual features. To enhance both fish appearance and motion information, a non-local-based multiple co-attention guidance module (M-CAGM) is applied in the encoder stage, in which the appearance and motion features from the intra-frame salient fish and the moving fish in video sequences are reciprocally enhanced. In addition, a feature adaptive fusion module (FAFM) is introduced in the decoder stage to avoid errors accumulated in the video sequences due to blurred fish or inaccurate optical flow. Experiments based on three publicly available datasets were designed to test the performance of the proposed model. The mean pixel accuracy (mPA) and mean intersection over union (mIoU) of MSGNet were 91.89% and 88.91% respectively with the mixed dataset. Compared with those of the advanced underwater fish segmentation and video object segmentation models, the mPA and mIoU of the proposed model significantly improved. The results showed that MSGNet achieves excellent segmentation performance in complex underwater videos and can provide an effective segmentation solution for fisheries resource assessment and ocean observation. The proposed model and code are exposed via Github1.

Keywords