PeerJ (Nov 2022)

Machine learning based estimation of field-scale daily, high resolution, multi-depth soil moisture for the Western and Midwestern United States

  • Yushu Xia,
  • Jennifer D. Watts,
  • Megan B. Machmuller,
  • Jonathan Sanderman

DOI
https://doi.org/10.7717/peerj.14275
Journal volume & issue
Vol. 10
p. e14275

Abstract

Read online Read online

Background High-resolution soil moisture estimates are critical for planning water management and assessing environmental quality. In-situ measurements alone are too costly to support the spatial and temporal resolutions needed for water management. Recent efforts have combined calibration data with machine learning algorithms to fill the gap where high resolution moisture estimates are lacking at the field scale. This study aimed to provide calibrated soil moisture models and methodology for generating gridded estimates of soil moisture at multiple depths, according to user-defined temporal periods, spatial resolution and extent. Methods We applied nearly one million national library soil moisture records from over 100 sites, spanning the U.S. Midwest and West, to build Quantile Random Forest (QRF) calibration models. The QRF models were built on covariates including soil moisture estimates from North American Land Data Assimilation System (NLDAS), soil properties, climate variables, digital elevation models, and remote sensing-derived indices. We also explored an alternative approach that adopted a regionalized calibration dataset for the Western U.S. The broad-scale QRF models were independently validated according to sampling depths, land cover type, and observation period. We then explored the model performance improved with local samples used for spiking. Finally, the QRF models were applied to estimate soil moisture at the field scale where evaluation was carried out to check estimated temporal and spatial patterns. Results The broad-scale QRF model showed moderate performance (R2 = 0.53, RMSE = 0.078 m3/m3) when data points from all depth layers (up to 100 cm) were considered for an independent validation. Elevation, NLDAS-derived moisture, soil properties, and sampling depth were ranked as the most important covariates. The best model performance was observed for forest and pasture sites (R2 > 0.5; RMSE 0.4; RMSE 0.5, RMSE < 0.08 m3/m3) showed promise for adopting the methodology for soil moisture monitoring. The success of spiking the national model with local samples showed the need to collect multi-year high frequency (e.g., hourly) sensor-based field measurements to improve estimates of soil moisture for a longer time period. Future work should improve model performance for deeper depths with additional hydraulic properties and use of locally-selected calibration datasets.

Keywords