Ecological Indicators (Mar 2021)
Heed the data gap: Guidelines for using incomplete datasets in annual stream temperature analyses
Abstract
Stream temperature data are useful for deciphering watershed processes important for aquatic ecosystems. Accurately extracting signal trends from stream temperature is essential for predicting responses of environmental and ecological indicators to change. Missing data periods are common for various reasons, and pose a challenge for scientists using temperature signal analysis to support stream research and ecological management objectives. However, the sensitivity of estimated temperature signal patterns to missing data has not been thoroughly evaluated, despite the potentially large impact on interpretation. In this study, we explored the effects of simulated missing daily data on the characterization of annual water temperature signals measured at headwater sites in the Pacific Northwest and Mid-Atlantic regions of the USA. For each site, we used linear regressions of sine-waves fitted to complete (365-d) and partial (7–357 consecutive missing data points) annual datasets of daily mean water temperature and computed three thermal parameters (mean, phase, and amplitude), which together can indicate thermally and ecologically influential watershed processes (e.g., depth and magnitude of groundwater discharge). Expected values (derived from complete datasets) ranged from 7.0 to 12.6 °C, 205 to 254 d, and 1.9 to 9.5 °C for annual mean, phase, and amplitude, respectively. While annual phase and amplitude could be accurately estimated (i.e., within 95–99% confidence intervals of expected values) with up to approximately two months of consecutively missing data, annual mean temperature required more complete datasets. We found that datasets with less than seven weeks of consecutively missing data enabled estimation of all annual signal parameters with reasonable accuracy (>75% probability of being within the 95–99% confidence intervals of expected values). Imputation of missing data expanded this range to approximately 20 weeks, with the greatest improvements in parameter estimation between 9 and 27 weeks of imputed missing data. However, caution should be exercised when applying this technique. For example, imputation improved the accuracy of parameter estimation for most sites, but accuracy decreased for some sites exhibiting strong groundwater influence. The timing of consecutive missing data points within a year had inconsistent effects on annual thermal parameter estimates among regions, years, and individual parameters. Utilizing sites with more than approximately seven consecutive weeks of missing data or 20 weeks of imputed data increases the probability of mischaracterization of annual stream thermal regimes. Understanding this limitation is vital for identifying the potential of streams to serve as climate refugia for ecological indicator species and effective future management of stream systems.