Ecological Informatics (Mar 2025)
Improving machine learning predictions to estimate fishing effort using vessel's tracking data
Abstract
Small-Scale Fisheries (SSF) comprise over 80 % of the global fleet and serve as the primary income source for numerous coastal communities. However, these critical fisheries face various threats. To effectively monitor SSF activities and their ecological impacts, it is required precise estimation of fishing effort using high-resolution spatio-temporal data. This information can identify areas with high fishing density, warranting protection of their main fishing grounds against other users (i.e. ocean grabbing), while also signalling potential stock depletion requiring management interventions and preserving the ecosystems from which these fisheries depend on.In this study, we propose a series of steps to enhance the performance of Machine Learning algorithms in estimating fishing effort. We assessed seven supervised ML algorithms, including Logistic Regression, Ridge Classifier, Random Forest Classifier, K-Neighbours, Gradient Boosting Classifier, LinearSVC, Recurrent Neural Networks and XGBoost, using four case studies, from bivalve dredge and octopus pots and traps fisheries.First, in a preliminary statistical analysis between common error measures derived from the confusion matrix was decided to use accuracy, precision, and sensitivity as evaluation criteria. We found that a simple moving average applied to speed, employed as a pre-processing technique using ten neighbouring points, showed up to 3 % improvement in results. Random Forest and XGBoost gave the best performances among the models compared (18 % change), using the variables Latitude, Longitude, Speed, Time, and Month (accuracies near 99 %)(61 % change). The proportion of the training/test dataset, showed a minimal impact on accuracy, with changes of less than 8 % when varying the training data percentage between 10 % and 90 %, making 60 % a suitable compromise. Considering the sampling unit to be (1) point-based (randomly selected pings) or (2) boat trip-based (randomly selected boat trips), leaded to changes in accuracy between 2.53 % and 3.99 %, depending on the model. Temporal resolution (ping rate) showed minimal effects on model performance, ranging from less than 2 % for intervals between 30 s (raw data with irregular time series) to 10 min (regular time series). As a post-processing step, it was concluded that replacing isolated data points with neighbouring values, significantly enhanced the detection of fishing events, with improvements ranging from 80 % to 250 %, depending on the model.In conclusion, this study presents a straightforward procedure for selecting a machine learning method and enhancing its power of classification using simple procedures. These approaches should be applied in all works using machine learning to produce fishing effort maps.