Applied Sciences (Dec 2020)

Species Distribution Modelling via Feature Engineering and Machine Learning for Pelagic Fishes in the Mediterranean Sea

  • Dimitrios Effrosynidis,
  • Athanassios Tsikliras,
  • Avi Arampatzis,
  • Georgios Sylaios

DOI
https://doi.org/10.3390/app10248900
Journal volume & issue
Vol. 10, no. 24
p. 8900

Abstract

Read online

In this work a fish species distribution model (SDM) was developed, by merging species occurrence data with environmental layers, with the scope to produce high resolution habitability maps for the whole Mediterranean Sea. The final model is capable to predict the probability of occurrence of each fish species at any location in the Mediterranean Sea. Eight pelagic, commercial fish species were selected for this study namely Engraulis encrasicolus, Sardina pilchardus, Sardinella aurita, Scomber colias, Scomber scombrus, Spicara smaris, Thunnus thynnus and Xiphias gladius. The SDM environmental predictors were obtained from the databases of Copernicus Marine Environmental Service (CMEMS) and the European Marine Observation and Data Network (EMODnet). The probabilities of fish occurrence data in low resolution and with several gaps were obtained from Aquamaps (FAO Fishbase). Data pre-processing involved feature engineering to construct 6830 features, representing the distribution of several mean-monthly environmental variables, covering a time-span of 10 years. Feature selection with the ensemble Reciprocal Ranking method was used to rank the features according to their relative importance. This technique increased model’s performance by 34%. Ten machine learning algorithms were then applied and tested based on their overall performance per species. The XGBoost algorithm performed better and was used as the final model. Feature categories were explored, with neighbor-based, extreme values, monthly and surface ones contributing most to the model. Environmental variables like salinity, temperature, distance to coast, dissolved oxygen and nitrate were found the strongest ones in predicting the probability of occurrence for the above eight species.

Keywords