Environmental Data Science (Jan 2023)
Probabilistic Prediction of Oceanographic Velocities with Multivariate Gaussian Natural Gradient Boosting
Abstract
Many single-output regression problems require estimates of uncertainty along with the point predictions. For this purpose, there exists a class of regression algorithms that predict a conditional distribution rather than a point estimate. The off-the-shelf options are much more limited, however, when the prediction output is multivariate and a joint measure of uncertainty is required. In this paper, we predict a distribution around a multivariate random vector of dimension P, such that the joint uncertainty would quantify the probability of any vector in P-dimensional space. This is more expressive than providing separate uncertainties in each dimension. To enable joint probabilistic regression, we propose a natural gradient boosting approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution, where we focus on the multivariate Gaussian distribution. Our method is robust, can be easily trained without extensive tuning, and performs competitively in comparison to existing approaches. The motivating application of our methodology is to predict two-dimensional oceanographic currents measured by freely floating Global Drifter Program drifters using remotely sensed data. We also demonstrate the method’s performance on simulated data. We find this method excels when strong correlation between output dimensions is present. As part of this work, we have added the model to the open source package at github.com/stanfordmlgroup/ngboost.
Keywords