Ecological Indicators (Jan 2021)

Comparison of random forest and multiple linear regression models for estimation of soil extracellular enzyme activities in agricultural reclaimed coastal saline land

  • Xuefeng Xie,
  • Tao Wu,
  • Ming Zhu,
  • Guojun Jiang,
  • Yan Xu,
  • Xiaohan Wang,
  • Lijie Pu

Journal volume & issue
Vol. 120
p. 106925

Abstract

Read online

The alternations in soil physicochemical properties caused by the reclamation of coastal tidal land can strongly affect the activities of soil extracellular enzymes. Soil extracellular enzymes are one of the most active organic components in soil ecosystem, which is involved in almost all the biochemical reactions. Determining the importance of potential influencing factors of soil extracellular enzymes and thus estimating their activities are important for clarifying the biological mechanism of soil carbon and nitrogen cycling. In this study, the multiple linear regressions (MLR) and random forest (RF) models were conducted to estimate the activities of soil amylase and urease activities using covariates, such as soil water content (SWC), electrical conductivity (EC), total nitrogen (TN), total phosphorus (TP), and soil organic carbon (SOC) as well as the soil bulk density (BD) and pH. The results reveals that the amylase activity of fishpond was significantly higher than that of other land use types, while the urease activity of rape land, broad bean land, and fishpond were notably higher than that of bare flat, Spartina alterniflora, and uncultivated land. The RF model indicated that the SWC and TN is the main variable affecting amylase and urease activity, respectively. The RF model performed much better than MLR model in estimating the soil amylase and urease activity as it revealed much lower error indices (MAE and RMSE) and higher R2 value. The superiority of RF model in estimating amylase and urease activity is due to its advantages to handle the nonlinear and hierarchical relationships between enzyme activities and covariates, and insensitivity to overfitting and the presence of noise in the data.

Keywords