Remote Sensing (Dec 2021)

Estimation of Salinity Content in Different Saline-Alkali Zones Based on Machine Learning Model Using FOD Pretreatment Method

  • Chengbiao Fu,
  • Anhong Tian,
  • Daming Zhu,
  • Junsan Zhao,
  • Heigang Xiong

DOI
https://doi.org/10.3390/rs13245140
Journal volume & issue
Vol. 13, no. 24
p. 5140

Abstract

Read online

Soil salinization is a global ecological and environmental problem in arid and semi-arid areas that can be ameliorated via soil management, visible-near infrared-shortwave infrared (VNIR-SWIR) spectroscopy can be adapted to rapidly monitor soil salinity content. This study explored the potential of Grünwald–Letnikov fractional-order derivative (FOD), feature band selection methods, nonlinear partial least squares regression (PLSR), and four machine learning models to estimate the soil salinity content using VNIR-SWIR spectra. Ninety sample points were field scanned with VNIR-SWR and soil samples (0–20 cm) were obtained at the time of scanning. The samples points come from three zones representing different intensities of human interference (I, II, and III Zones) in Fukang, Xinjiang, China. Each zone contained thirty sample points. For modeling, we firstly adopted FOD (with intervals of 0.1 and range of 0–2) as a preprocessing method to analyze soil hyperspectral data. Then, four sets of spectral bands (R-FOD-FULL indicates full band range, R-FOD-CC5 bands that met a 0.05 significance test, R-FOD-CC1 bands that met a 0.01 significance test, and R-FOD-CC1-CARS represents CC1 combined with competitive adaptive reweighted sampling) were selected as spectral input variables to develop the estimation model. Finally, four machine learning models, namely, generalized regression neural network (GRNN), extreme learning machine (ELM), random forest (RF), and PLSR, to estimate soil salinity. Study results showed that (1) the heat map of correlation coefficient matrix between hyperspectral data and salinity indicated that FOD significantly improved the correlation. (2) The characteristic band variables extracted and used by R-FOD-CC1 were fewer in number, and redundancy between bands smaller than R-FOD-FULL and R-FOD-CC5, thus estimation accuracy of R-FOD-CC1 was higher than R-FOD-CC5 or R-FOD-FULL. A high prediction accuracy was achieved with a less complex calculation. (3) The GRNN model yielded the best salinity estimation in all three zones compared to ELM, BPNN, RF, and PLSR on the whole, whereas, the RF model had the worst estimation effect. The R-FOD-CC1-CARS-GRNN model yielded the best salinity estimation in I Zone with R2, RMSE and RPD of 0.7784, 1.8762, and 2.0568, respectively. The fractional order was 1.5 and estimation performance was great. The optimal model for predicting soil salinity in II and III Zone was, also, R-FOD-CC1-CARS-GRNN (R2 = 0.7912, RMSE = 3.4001, and RPD = 1.8985 in II Zone; R2 = 0.8192, RMSE = 6.6260, and RPD = 1.8190 in III Zone), with the fractional order of 1.7- and 1.6-, respectively, and the estimation performance were all fine. (4) The characteristic bands selected by the best model in I, II, and III Zones were 8, 9, and 11, respectively, which account for 0.45%, 0.51%, and 0.63%% of the full bands. This approach reduces the number of modeled band variables and simplifies the model structure.

Keywords