Applied Sciences (May 2022)

Modelling Soil Temperature by Tree-Based Machine Learning Methods in Different Climatic Regions of China

  • Jianhua Dong,
  • Guomin Huang,
  • Lifeng Wu,
  • Fa Liu,
  • Sien Li,
  • Yaokui Cui,
  • Yicheng Wang,
  • Menghui Leng,
  • Jie Wu,
  • Shaofei Wu

DOI
https://doi.org/10.3390/app12105088
Journal volume & issue
Vol. 12, no. 10
p. 5088

Abstract

Read online

Accurate estimation of soil temperature (Ts) at a national scale under different climatic conditions is important for soil–plant–atmosphere interactions. This study estimated daily Ts at the 0 cm depth for 689 meteorological stations in seven different climate zones of China for the period 1966–2015 with the M5P model tree (M5P), random forests (RF), and the extreme gradient boosting (XGBoost). The results showed that the XGBoost model (averaged coefficient of determination (R2) = 0.964 and root mean square error (RMSE) = 2.066 °C) overall performed better than the RF (averaged R2 = 0.959 and RMSE = 2.130 °C) and M5P (averaged R2 = 0.954 and RMSE = 2.280 °C) models for estimating Ts with higher computational efficiency. With the combination of mean air temperature (Tmean) and global solar radiation (Rs) as inputs, the estimating accuracy of the models was considerably high (averaged R2 = 0.96–0.97 and RMSE = 1.73–1.99 °C). On the basis of Tmean, adding Rs to the model input had a greater degree of influence on model estimating accuracy than adding other climatic factors to the input. Principal component analysis indicated that soil organic matter, soil water content, Tmean, relative humidity (RH), Rs, and wind speed (U2) are the main factors that cause errors in estimating Ts, and the total error interpretation rate was 97.9%. Overall, XGBoost would be a suitable algorithm for estimating Ts in different climate zones of China, and the combination of Tmean and Rs as model inputs would be more practical than other input combinations.

Keywords