Evaluation of Machine Learning Predictions of a Highly Resolved Time Series of Chlorophyll-a Concentration

Felipe de Luca Lopes de Amorim; Johannes Rick; Gerrit Lohmann; Karen Helen Wiltshire

doi:10.3390/app11167208

Applied Sciences (Aug 2021)

Evaluation of Machine Learning Predictions of a Highly Resolved Time Series of Chlorophyll-a Concentration

Felipe de Luca Lopes de Amorim,
Johannes Rick,
Gerrit Lohmann,
Karen Helen Wiltshire

Affiliations

Felipe de Luca Lopes de Amorim: Helmholtz Centre for Polar and Marine Research, Alfred Wegener Institute, Wadden Sea Research Station, Hafenstr. 43, 25992 List auf Sylt, Germany
Johannes Rick: Helmholtz Centre for Polar and Marine Research, Alfred Wegener Institute, Wadden Sea Research Station, Hafenstr. 43, 25992 List auf Sylt, Germany
Gerrit Lohmann: Division of Climate Sciences, Section of Paleoclimate Dynamics, Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, 27570 Bremerhaven, Germany
Karen Helen Wiltshire: Helmholtz Centre for Polar and Marine Research, Alfred Wegener Institute, Wadden Sea Research Station, Hafenstr. 43, 25992 List auf Sylt, Germany

DOI: https://doi.org/10.3390/app11167208
Journal volume & issue: Vol. 11, no. 16
p. 7208

Abstract

Read online

Pelagic chlorophyll-a concentrations are key for evaluation of the environmental status and productivity of marine systems, and data can be provided by in situ measurements, remote sensing and modelling. However, modelling chlorophyll-a is not trivial due to its nonlinear dynamics and complexity. In this study, chlorophyll-a concentrations for the Helgoland Roads time series were modeled using a number of measured water and environmental parameters. We chose three common machine learning algorithms from the literature: the support vector machine regressor, neural networks multi-layer perceptron regressor and random forest regressor. Results showed that the support vector machine regressor slightly outperformed other models. The evaluation with a test dataset and verification with an independent validation dataset for chlorophyll-a concentrations showed a good generalization capacity, evaluated by the root mean squared errors of less than 1 µg L−1. Feature selection and engineering are important and improved the models significantly, as measured in performance, improving the adjusted R2 by a minimum of 48%. We tested SARIMA in comparison and found that the univariate nature of SARIMA does not allow for better results than the machine learning models. Additionally, the computer processing time needed was much higher (prohibitive) for SARIMA.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords