Geoderma (Feb 2024)
Effect of measurement error in wet chemistry soil data on the calibration and model performance of pedotransfer functions
Abstract
Soil properties that are considered difficult to measure are frequently determined through pedotransfer functions (PTFs). Calibration and validation datasets, containing measurements of the target soil property as well as widely available basic soil attributes, are needed to construct and validate PTFs. However, the target soil property data are prone to measurement errors, which are typically ignored when deriving new PTFs. In this contribution, we considered a PTF that predicts the soil cation exchange capacity (CEC) and aimed to study the effect of additive and multiplicative measurement error in the CEC calibration data on multiple linear regression (MLR) and random forest (RF) PTF predictions and associated prediction uncertainty. We used data from the National Cooperative Soil Survey Soil Characterization (NCSS-SC) Database, which unfortunately did not contain repeated measurements. Therefore, data from the Wageningen Evaluating Programmes for Analytical Laboratories (WEPAL) were used to quantify measurement error variance in CEC data. We assumed that the obtained measurement error variance was representative for the NCSS-SC Database CEC data, and used it to generate a probability distribution of the CEC measurement error. We used Monte Carlo simulations to add measurement errors to the original CEC values and fitted MLR- and RF-PTFs to ‘error-free’ and ‘error-contaminated’ datasets to study the effect of measurement error. For the MLR-PTFs, measurement error did not lead to large differences in the mean estimated model coefficients when compared to the estimates based on the ‘true’ CEC. Similarly, the variable importance scores of the RF-PTFs were comparable. However, a relationship between dataset size and the standard deviations of the model coefficient estimates and variable importance scores was present. Furthermore, the model performance reduced slightly (MEC decreased between 0.09 and 2.53%) compared to the MLR-PTFs fitted on the ‘true’ CEC. A larger effect of measurement error on model performance was found for the RF-PTFs, where the MEC decreased between 1.52 and 31.59%. The study showed that measurement error in the calibration data can impact the calibration and statistical validation of PTFs, especially when using small datasets or when measurement error variance is large.