Geoderma (Dec 2023)

Towards a cost-effective framework for estimating soil nitrogen pools using pedotransfer functions and machine learning

  • Luke Laurence,
  • Brandon Heung,
  • Hardy Strom,
  • Kyra Stiles,
  • David Burton

Journal volume & issue
Vol. 440
p. 116692

Abstract

Read online

Globally, the strategic use of nitrogen (N) is important in optimizing economic returns and reducing soil nitrogen losses to the environment. Incorporating reliable estimates of nitrogen (N) mineralized over a growing season (GSN) into N-fertilizer rate prescriptions is critical, but may often lack a direct measurement. For this purpose, Pedotransfer functions (PTFs) of total nitrogen (TN) – representing the stable pool from which N is mineralized and biological nitrogen availability (BNA) – representing the labile pool of N mineralization were used to estimate GSN. GSN was calculated based on TN and BNA results from a soil health database (SHD), which also includes a suite of related soil health parameters (n = 2222). Using a process of recursive feature elimination (RFE) and cost-benefit feature elimination (CBFE), the best predictors of TN, BNA, and GSN were identified using a suite of machine learners (MLs) and regression analysis. For TN, RFE revealed that BNA, active carbon (AC), sand (Sa), and soil organic matter (OM) were the best predictors yielding a Lin’s concordance correlation coefficient (CCC) of 0.80 and a reduction in theoretical cost of 41 % compared to the control. CBFE resulted in AC, soil respiration (SR), clay, Sa, and OM as the most cost-effective predictors of TN with a CCC of 0.79 and a theoretical cost savings 49 % below the cost of using all appropriate soil health parameters in the SHD. With respect to BNA, the best predictors from RFE were aggregate stability (AS), AC, SR, and TN with a CCC of 0.78 and a theoretical cost reduction of 23 %. CBFE retained AC, SR, S, TN, OM and pH as predictors of BNA with a CCC of 0.78 and reduction of 29 % in theoretical cost. Finally, GSN results from RFE identified AS, AC, SR, OM and pH as the best predictors with a 0.82 CCC and 17 % reduction in theoretical cost. CBFE, on the other hand, identified AC, SR, sand, OM, and pH as the most cost-efficient predictors while maintaining a CCC of 0.82 and theoretical cost reduction of 29 %. Of the MLs used for pattern recognition (i.e., cubist, random forest, support vector machine, and stochastic gradient boosting), cubist model outperformed the others for the majority of iterations of the RFE and CBFE processes. The cost-effective framework, and the N-related PTFs developed in this study will greatly enhance our ability to predict of soil N-pool dynamics and the ability to incorporate GSN estimates into N-fertilizer recommendations for producers worldwide. Improvements in predictive strength could be achieved by incorporating climate and soil management practices into PTF development. Another area for improvement and future study would include addition of spatial and landscape variability related to N-measures via digital soil mapping applications.

Keywords