Spatial+: A new cross-validation method to evaluate geospatial machine learning models

Yanwen Wang; Mahdi Khodadadzadeh; Raúl Zurita-Milla

International Journal of Applied Earth Observations and Geoinformation (Jul 2023)

Spatial+: A new cross-validation method to evaluate geospatial machine learning models

Yanwen Wang,
Mahdi Khodadadzadeh,
Raúl Zurita-Milla

Affiliations

Yanwen Wang: Corresponding author.; Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, 7514AE Enschede, The Netherlands
Mahdi Khodadadzadeh: Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, 7514AE Enschede, The Netherlands
Raúl Zurita-Milla: Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, 7514AE Enschede, The Netherlands

Journal volume & issue: Vol. 121
p. 103364

Abstract

Read online

Random cross-validation (CV) is often used to evaluate geospatial machine learning models, particularly when a limited amount of sample data are available, and collecting an extra test set is unfeasible. However, the prediction locations can be substantially different from the available sample, leading to over-optimistic evaluation results. This has fostered the development of spatial CV methods. Yet these methods only focus on spatial autocorrelation and cannot sufficiently guarantee that the validation subset is a good proxy of the test set with significant differences. In this paper, we propose the spatial+ cross-validation (SP-CV) method. This method, which considers both the geographic and feature spaces, is composed of two stages. The first stage addresses spatial autocorrelation issues by using agglomerative hierarchical clustering to divide the available sample into blocks. The second stage deals with multiple sources of differences. It uses cluster ensembles to split the blocks into training and validation folds based on the locations of the sample data and the values of the covariates and target variable. The proposed method is compared against random and block CV methods in a series of experiments with Amazon basin above ground biomass and California houseprice datasets. Our results show that SP-CV provided the smallest error differences with respect to the reference error. This means that SP-CV produced more representative splits and led to more reliable model evaluations. It suggests that a reliable model evaluation requires to consider both the geographic and the feature spaces in a comprehensive manner.

Published in International Journal of Applied Earth Observations and Geoinformation

ISSN: 1569-8432 (Print); 1872-826X (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Geography. Anthropology. Recreation: Physical geography; Geography. Anthropology. Recreation: Environmental sciences
Website: https://www.journals.elsevier.com/international-journal-of-applied-earth-observation-and-geoinformation

About the journal

Abstract

Keywords