Sensors (Nov 2023)

Towards the Improvement of Soil Salinity Mapping in a Data-Scarce Context Using Sentinel-2 Images in Machine-Learning Models

  • J. W. Sirpa-Poma,
  • F. Satgé,
  • E. Resongles,
  • R. Pillco-Zolá,
  • J. Molina-Carpio,
  • M. G. Flores Colque,
  • M. Ormachea,
  • P. Pacheco Mollinedo,
  • M.-P. Bonnet

DOI
https://doi.org/10.3390/s23239328
Journal volume & issue
Vol. 23, no. 23
p. 9328

Abstract

Read online

Several recent studies have evidenced the relevance of machine-learning for soil salinity mapping using Sentinel-2 reflectance as input data and field soil salinity measurement (i.e., Electrical Conductivity-EC) as the target. As soil EC monitoring is costly and time consuming, most learning databases used for training/validation rely on a limited number of soil samples, which can affect the model consistency. Based on the low soil salinity variation at the Sentinel-2 pixel resolution, this study proposes to increase the learning database’s number of observations by assigning the EC value obtained on the sampled pixel to the eight neighboring pixels. The method allowed extending the original learning database made up of 97 field EC measurements (OD) to an enhanced learning database made up of 691 observations (ED). Two classification machine-learning models (i.e., Random Forest-RF and Support Vector Machine-SVM) were trained with both OD and ED to assess the efficiency of the proposed method by comparing the models’ outcomes with EC observations not used in the models´ training. The use of ED led to a significant increase in both models’ consistency with the overall accuracy of the RF (SVM) model increasing from 0.25 (0.26) when using the OD to 0.77 (0.55) when using ED. This corresponds to an improvement of approximately 208% and 111%, respectively. Besides the improved accuracy reached with the ED database, the results showed that the RF model provided better soil salinity estimations than the SVM model and that feature selection (i.e., Variance Inflation Factor-VIF and/or Genetic Algorithm-GA) increase both models´ reliability, with GA being the most efficient. This study highlights the potential of machine-learning and Sentinel-2 image combination for soil salinity monitoring in a data-scarce context, and shows the importance of both model and features selection for an optimum machine-learning set-up.

Keywords