International Journal of Applied Earth Observations and Geoinformation (Jul 2023)

Spatial+: A new cross-validation method to evaluate geospatial machine learning models

  • Yanwen Wang,
  • Mahdi Khodadadzadeh,
  • Raúl Zurita-Milla

Journal volume & issue
Vol. 121
p. 103364

Abstract

Read online

Random cross-validation (CV) is often used to evaluate geospatial machine learning models, particularly when a limited amount of sample data are available, and collecting an extra test set is unfeasible. However, the prediction locations can be substantially different from the available sample, leading to over-optimistic evaluation results. This has fostered the development of spatial CV methods. Yet these methods only focus on spatial autocorrelation and cannot sufficiently guarantee that the validation subset is a good proxy of the test set with significant differences. In this paper, we propose the spatial+ cross-validation (SP-CV) method. This method, which considers both the geographic and feature spaces, is composed of two stages. The first stage addresses spatial autocorrelation issues by using agglomerative hierarchical clustering to divide the available sample into blocks. The second stage deals with multiple sources of differences. It uses cluster ensembles to split the blocks into training and validation folds based on the locations of the sample data and the values of the covariates and target variable. The proposed method is compared against random and block CV methods in a series of experiments with Amazon basin above ground biomass and California houseprice datasets. Our results show that SP-CV provided the smallest error differences with respect to the reference error. This means that SP-CV produced more representative splits and led to more reliable model evaluations. It suggests that a reliable model evaluation requires to consider both the geographic and the feature spaces in a comprehensive manner.

Keywords