International Journal of Applied Earth Observations and Geoinformation (Nov 2024)
Classification of arsenic contamination in soil across the EU by vis-NIR spectroscopy and machine learning
Abstract
Detecting soil arsenic (As) contamination is crucial for designing efficient soil remediation strategies; however, traditional laboratory-based As detection techniques are time- and labour-intensive and are unsuitable for large-scale spatial analyses. To address this issue, we combined machine learning (ML) with visible-near-infrared (vis-NIR) spectroscopy to develop an efficient framework for As detection in soil. The optimal spectral preprocessing method was determined, and eight ML models were compared. The support vector classifier achieved optimal performance after subsequent hyperparameter tuning, with area under the curve (AUC) and accuracy values of 0.89 and 0.83, respectively. Important spectral bands at 471 and 2422 nm were identified by permutation importance and correspond to Fe-oxide and carbonate, respectively. These two wavelengths were included in the partial dependence plot (PDP), revealing that the likelihood of soil As contamination decreased with increasing reflectance at wavelengths of 471 and 2422 nm due to a decrease in Fe-oxide and carbonate content. Consistent with this finding, two-way PDP analysis revealed that the As content of soil increased with increasing Fe-oxide and carbonate content. The model’s classification performance was further improved using an ensemble technique based on three optimal ML models, resulting in increased AUC and accuracy values of 0.9 and 0.83, respectively. Overall, the framework presented in this study enabled the precise classification of soil As content at the continental scale, while also indirectly explained the complex relationships between As content and soil properties.