Ecological Indicators (Jan 2024)

Comparison of model selection and data bias on the prediction performance of purpleback flying squid (Sthenoteuthis oualaniensis) fishing ground in the Northwest Indian Ocean

  • Haibin Han,
  • Bohui Jiang,
  • Delong Xiang,
  • Yongchuang Shi,
  • Siyuan Liu,
  • Chen Shang,
  • Xinye Zhao,
  • Heng Zhang,
  • Yuyan Sun

Journal volume & issue
Vol. 158
p. 111526

Abstract

Read online

Purpleback flying squid (Sthenoteuthis oualaniensis, PFS) is one of the critical economically cephalopod species in the northwest Indian Ocean, and the accurate model and dataset selection are critical parameters for predicting and managing the PFS fishing grounds. In this study, the PFS fishery data for the years 2016–2021 was analyzed using the gravity center of fishing grounds method, generalized additive model (GAM), gradient boosted trees (GBT), 3D Convolutional Neural Network (3DCNN), and 3D Convolutional Neural Networks-Convolutional LSTM Network (3DCNN-ConvLSTM) to explore the differences in annual catches, the annual gravity center of fishing ground, model performance, and importance of environmental variables in the case of datasets A (no moonlight days) and B (no moonlight days + bright moonlight days). The results are as follows: 1) Datasets A and B exhibit similar patterns of variation, with annual catches rising and then declining and the annual gravity center of fishing ground moving northeastward overall; 2) The GAM and GBT models had better model performance on dataset B (GAM (average F1-score ± standard deviation): 0.678658 ± 0.014684; GBT: 0.737422 ± 0.011748)than on dataset A (GAM: 0.676802 ± 0.013403; GBT: 0.736547 ± 0.013323), but almost negligible, and the standard deviation of the GAM model on dataset B becomes larger. 3DCNN and 3DCNN-ConvLSTM models perform in contrast, with significantly better F1-score and standard deviations on dataset A (3DCNN: 0.75048 ± 0.019763; 3DCNN-ConvLSTM: 0.740041 ± 0.023927) than on dataset B (3DCNN: 0.746378 ± 0.020337; 3DCNN-ConvLSTM: 0.736927 ± 0.04498); 3) The 3DCNN (optimal prediction of performance) or GBT (Optimal model stability) model is optimal for predicting the PFS fishing grounds; 4) GBT, 3DCNN, and 3DCNN-ConvLSTM model results all showed that the differences in the importance of environmental variables obtained from datasets A and B were significant; 5) Unlike the GBT model, the 3DCNN and 3DCNN-ConvLSTM models were more susceptible to dataset influences, with significant dependence on environmental variables that have large positive and negative sample differences. This study provides rich suggestions for constructing a predictive model for the PFS fishing grounds in the context of climate change. It also provides a new perspective on cleaning up biased data for light fisheries.

Keywords