Advances in Statistical Climatology, Meteorology and Oceanography (Oct 2024)

A robust approach to Gaussian process implementation

  • J. Mukangango,
  • A. Muyskens,
  • B. W. Priest

DOI
https://doi.org/10.5194/ascmo-10-143-2024
Journal volume & issue
Vol. 10
pp. 143 – 158

Abstract

Read online

Gaussian process (GP) regression is a flexible modeling technique used to predict outputs and to capture uncertainty in the predictions. However, the GP regression process becomes computationally intensive when the training spatial dataset has a large number of observations. To address this challenge, we introduce a scalable GP algorithm, termed MuyGPs, which incorporates nearest-neighbor and leave-one-out cross-validation during training. This approach enables the evaluation of large spatial datasets with state-of-the-art accuracy and speed in certain spatial problems. Despite these advantages, conventional quadratic loss functions used in the MuyGPs optimization, such as root mean squared error (RMSE), are highly influenced by outliers. We explore the behavior of MuyGPs in cases involving outlying observations and, subsequently, develop a robust approach to handle and mitigate their impact. Specifically, we introduce a novel leave-one-out loss function based on the pseudo-Huber function (LOOPH) that effectively accounts for outliers in large spatial datasets within the MuyGPs framework. Our simulation study shows that the LOOPH loss method maintains accuracy despite outlying observations, establishing MuyGPs as a powerful tool for mitigating unusual observation impacts in the large data regime. In the analysis of US ozone data, MuyGPs provides accurate predictions and uncertainty quantification, demonstrating its utility in managing data anomalies. Through these efforts, we advance the understanding of GP regression in spatial contexts.