Land (Jun 2024)
Feature Selection and Regression Models for Multisource Data-Based Soil Salinity Prediction: A Case Study of Minqin Oasis in Arid China
Abstract
(1) Monitoring salinized soil in saline–alkali land is essential, requiring regional-scale soil salinity inversion. This study aims to identify sensitive variables for predicting electrical conductivity (EC) in soil, focusing on effective feature selection methods. (2) The study systematically selects a feature subset from Sentinel-1 C SAR, Sentinel-2 MSI, and SRTM DEM data. Various feature selection methods (correlation analysis, LASSO, RFE, and GRA) are employed on 79 variables. Regression models using random forest regression (RF) and partial least squares regression (PLSR) algorithms are constructed and compared. (3) The results highlight the effectiveness of the RFE algorithm in reducing model complexity. The model incorporates significant environmental factors like soil moisture, topography, and soil texture, which play an important role in modeling. Combining the method with RF improved soil salinity prediction (R2 = 0.71, RMSE = 1.47, RPD = 1.84). Overall, salinization in Minqin oasis soils was evident, especially in the unutilized land at the edge of the oasis. (4) Integrating data from different sources to construct characterization variables overcomes the limitations of a single data source. Variable selection is an effective means to address the redundancy of variable information, providing insights into feature engineering and variable selection for soil salinity estimation in arid and semi-arid regions.
Keywords