WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting

I. Demir; I. Demir; Z. Xiang; B. Demiray; M. Sit

doi:10.5194/essd-14-5605-2022

Earth System Science Data (Dec 2022)

WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting

I. Demir,
I. Demir,
Z. Xiang,
B. Demiray,
M. Sit

Affiliations

I. Demir: Department of Civil and Environmental Engineering, University of Iowa, Iowa City, 52246 Iowa, USA
I. Demir: Department of Electrical and Computer Engineering, University of Iowa, Iowa City, 52246 Iowa, USA
Z. Xiang: Department of Civil and Environmental Engineering, University of Iowa, Iowa City, 52246 Iowa, USA
B. Demiray: Interdisciplinary Graduate Program in Informatics, University of Iowa, Iowa City, 52246 Iowa, USA
M. Sit: Interdisciplinary Graduate Program in Informatics, University of Iowa, Iowa City, 52246 Iowa, USA

DOI: https://doi.org/10.5194/essd-14-5605-2022
Journal volume & issue: Vol. 14
pp. 5605 – 5616

Abstract

Read online

This study proposes a comprehensive benchmark dataset for streamflow forecasting, WaterBench-Iowa, that follows FAIR (findability, accessibility, interoperability, and reuse) data principles and is prepared with a focus on convenience for utilizing in data-driven and machine learning studies, and provides benchmark performance for state of art deep learning architectures on the dataset for comparative analysis. By aggregating the datasets of streamflow, precipitation, watershed area, slope, soil types, and evapotranspiration from federal agencies and state organizations (i.e., NASA, NOAA, USGS, and Iowa Flood Center), we provided the WaterBench-Iowa for hourly streamflow forecast studies. This dataset has a high temporal and spatial resolution with rich metadata and relational information, which can be used for a variety of deep learning and machine learning research. We defined a sample benchmark task of predicting the hourly streamflow for the next 5 d for future comparative studies, and provided benchmark results on this task with sample linear regression and deep learning models, including long short-term memory (LSTM), gated recurrent units (GRU), and sequence-to-sequence (S2S). Our benchmark model results show a median Nash-Sutcliffe efficiency (NSE) of 0.74 and a median Kling-Gupta efficiency (KGE) of 0.79 among 125 watersheds for the 120 h ahead streamflow prediction task. WaterBench-Iowa makes up for the lack of unified benchmarks in earth science research and can be accessed at Zenodo https://doi.org/10.5281/zenodo.7087806 (Demir et al., 2022a).

Published in Earth System Science Data

ISSN: 1866-3508 (Print); 1866-3516 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Geography. Anthropology. Recreation: Environmental sciences; Science: Geology
Website: http://www.earth-system-science-data.net/

About the journal