Frontiers in Water (Aug 2020)

Sequential Imputation of Missing Spatio-Temporal Precipitation Data Using Random Forests

  • Utkarsh Mital,
  • Dipankar Dwivedi,
  • James B. Brown,
  • Boris Faybishenko,
  • Scott L. Painter,
  • Carl I. Steefel

DOI
https://doi.org/10.3389/frwa.2020.00020
Journal volume & issue
Vol. 2

Abstract

Read online

Meteorological records, including precipitation, commonly have missing values. Accurate imputation of missing precipitation values is challenging, however, because precipitation exhibits a high degree of spatial and temporal variability. Data-driven spatial interpolation of meteorological records is an increasingly popular approach in which missing values at a target station are imputed using synchronous data from reference stations. The success of spatial interpolation depends on whether precipitation records at the target station are strongly correlated with precipitation records at reference stations. However, the need for reference stations to have complete datasets implies that stations with incomplete records, even though strongly correlated with the target station, are excluded. To address this limitation, we develop a new sequential imputation algorithm for imputing missing values in spatio-temporal daily precipitation records. We demonstrate the benefits of sequential imputation by incorporating it within a spatial interpolation based on a Random Forest technique. Results show that for reliable imputation, having a few strongly correlated references is more effective than having a larger number of weakly correlated references. Further, we observe that sequential imputation becomes more beneficial as the number of stations with incomplete records increases. Overall, we present a new approach for imputing missing precipitation data which may also apply to other meteorological variables.

Keywords