Water (Mar 2023)

Forecasting Monthly Water Deficit Based on Multi-Variable Linear Regression and Random Forest Models

  • Yi Li,
  • Kangkang Wei,
  • Ke Chen,
  • Jianqiang He,
  • Yong Zhao,
  • Guang Yang,
  • Ning Yao,
  • Ben Niu,
  • Bin Wang,
  • Lei Wang,
  • Puyu Feng,
  • Zhe Yang

DOI
https://doi.org/10.3390/w15061075
Journal volume & issue
Vol. 15, no. 6
p. 1075

Abstract

Read online

Forecasting water deficit is challenging because it is modulated by uncertain climate, different environmental and anthropic factors, especially in arid and semi-arid northwestern China. The monthly water deficit index D at 44 sites in northwestern China over 1961−2020 were calculated. The key large-scale circulation indices related to D were screened using Pearson’s correlation (r). Subsequently, we predicted monthly D with the multi-variable linear regression (MLR) and random forest (RF) models at certain lagged times after being strictly calibrated and validated. The results showed the following: (1) The r between the monthly D and the screened key circulation indices varied from 0.71 to 0.85 and the lagged time ranged from 1 to 12 months. (2) The calibrated and validated performance of the established MLR and RF models were all good at the 44 sites. Overall, the RF model outperformed the MLR model with a higher coefficient of determination (R2 > 0.8 at 38 sites) and mean absolute percentage error (MAPE D in northwestern China, followed by SSRP, WPWPA, NANRP, and PPVA. (4) The forecasted monthly D values based on RF models indicated that the water deficit in northwestern China would be most severe (−239.7 to −62.3 mm) in August 2022. In conclusion, using multiple large-scale climate signals to drive a machine learning model is a promising method for predicting water deficit conditions in northwestern China.

Keywords