Agricultural Water Management (Sep 2025)

An interpretable coupled model (SWAT-STFT) for multispatial-multistep evapotranspiration prediction in the river basin

  • Zhonghui Guo,
  • Chang Feng,
  • Liu Yang,
  • Qing Liu

DOI
https://doi.org/10.1016/j.agwat.2025.109742
Journal volume & issue
Vol. 318
p. 109742

Abstract

Read online

Evapotranspiration (ET) is a critical component of the hydrological cycle, and its spatiotemporal prediction and interpretation are essential for managing agricultural water resources in river basins. However, both physics-based models (PBM) and data-driven models (DDM) have inherent limitations in watershed ET modeling and mechanistic interpretation, while their coupling provides a potential solution by integrating reliable hydrological physical mechanisms with robust nonlinear learning capabilities. This study developed an interpretable coupled model by embedding physical hydrological constraints derived from the Soil and Water Assessment Tool (SWAT) into an attention-based Spatio-Temporal Fusion Transformer (STFT, incorporating spatiotemporal cross-learning mechanisms), achieving watershed ET multispatial-multistep prediction (MMP) while providing self-spatiotemporal interpretation (SSTI) for underlying ET mechanisms. Applied to the Xiangjiang River Basin (XRB), the coupled model predicted ET across 103 subbasins with a 6-month lead time, achieving median Nash-Sutcliffe Efficiency (NSE) and coefficient of determination (R²) values both exceeding 0.9 during the testing period. It outperformed LSTM and Transformer-based baseline models in both individual and SWAT-coupled scenarios. These results demonstrated that the SWAT-STFT coupled model achieved satisfactory ET MMP performance. Additionally, the coupled model's SSTI leveraged model-internal attention weights to provide new insights into further understanding watershed ET spatiotemporal mechanisms. The feature global interpretation results revealed that meteorological features served as ET dominant drivers in time-varying features (48 % importance), followed by land use features (35 %), while soil features (64 %) dominated static features. Temporal attention interpretation showed bimodal patterns with attention peaks at t-24 and t-7 month time-steps (normalized attention weights of 0.83 and 0.48, respectively), reflecting the model's sensitivity to both long-term climate trends and seasonal transitions. Spatial effect interpretation revealed distinct ET heterogeneity across subbasins, with midstream regions showing 55 % above-average importance for forest land features, and critical areas identified in eastern and southern XRB. This integration of physics-based and data-driven modeling not only provides valuable insights into watershed ET modeling prediction and mechanistic understanding but also underscores the broader potential for application across global watersheds and related disciplines.

Keywords