Revista Ambiente & Água (Apr 2022)

Methodological approaches for imputing missing data into monthly flows series

  • Michel Trarbach Bleidorn,
  • Wanderson de Paula Pinto,
  • Isamara Maria Schmidt,
  • Antonio Sergio Ferreira Mendonça,
  • José Antonio Tosta dos Reis

DOI
https://doi.org/10.4136/ambi-agua.2795
Journal volume & issue
Vol. 17, no. 2
pp. 1 – 27

Abstract

Read online

Missing data is one of the main difficulties in working with fluviometric records. Database gaps may result from fluviometric stations components problems, monitoring interruptions and lack of observers. Incomplete series analysis generates uncertain results, negatively impacting water resources management. Thus, proper missing data consideration is very important to ensure better information quality. This work aims to analyze, comparatively, missing data imputation methodologies in monthly river-flow time series, considering, as a case study, the Doce River, located in Southeast Brazil. Missing data were simulated in 5%, 10%, 15%, 25% and 40% proportions following a random distribution pattern, ignoring the missing data generation mechanisms. Ten missing data imputation methodologies were used: arithmetic mean, median, simple and multiple linear regression, regional weighting, spline and Stineman interpolation, Kalman smoothing, multiple imputation and maximum likelihood. Their performances were compared through bias, root mean square error, absolute mean percentage error, determination coefficient and concordance index. Results indicate that for 5% missing data, any methodology for imputing can be considered, recommending caution for arithmetic mean method application. However, as the missing data proportion increases, it is recommended to use multiple imputation and maximum likelihood methodologies when there are support stations for imputation, and the Stineman interpolation and Kalman Smoothing methods when only the studied series is available.

Keywords