Investigation of using missing data imputation methodologies effect on the SARIMA model performance: application to average monthly flows

Michel Trarbach Bleidorn; Isamara Maria Schmidt; José Antonio Tosta dos Reis; Deysilara Figueira Pani; Wanderson de Paula Pinto; Carlo Corrêa Solci; Antonio Sergio Ferreira Mendonça; Gutemberg Hespanha Brasil

doi:10.1590/2318-0331.292420230131

Revista Brasileira de Recursos Hídricos (Aug 2024)

Investigation of using missing data imputation methodologies effect on the SARIMA model performance: application to average monthly flows

Michel Trarbach Bleidorn,
Isamara Maria Schmidt,
José Antonio Tosta dos Reis,
Deysilara Figueira Pani,
Wanderson de Paula Pinto,
Carlo Corrêa Solci,
Antonio Sergio Ferreira Mendonça,
Gutemberg Hespanha Brasil

Affiliations

Michel Trarbach Bleidorn: ORCiD
Isamara Maria Schmidt: ORCiD
José Antonio Tosta dos Reis: ORCiD
Deysilara Figueira Pani: ORCiD
Wanderson de Paula Pinto: ORCiD
Carlo Corrêa Solci: ORCiD
Antonio Sergio Ferreira Mendonça: ORCiD
Gutemberg Hespanha Brasil: ORCiD

DOI: https://doi.org/10.1590/2318-0331.292420230131
Journal volume & issue: Vol. 29

Abstract

Read online Read online

ABSTRACT Accuracy in river flows forecasts is crucial for Hydrology, but is challenged by fluviometric data quality. This study investigates the impact of different missing data imputation methods on the Seasonal Autoregressive Integrated Moving Average (SARIMA) model performance. SARIMA (1,1,1)(0,1,1)12 was selected using semi-automated criteria, such as lowest AIC, significant parameters (p-value < 0.05) and residuals adequacy. This model was then compared with reconstructed series using different imputation methods such as Mean (AM), Median (M), Spline and Stinemann Interpolations, Regional Weighting (RW), Multiple Linear Regression (MLR), Multiple Imputation (MI) and Maximum Likelihood (ML). The data were analyzed considering scenarios of 5, 20 and 40% missing data, following random and block patterns, using data from the Doce River, in Southeast Brazil. Results obtained by the performance indicators and, their respective relative differences, indicated that, univariate (AM and M) and multivariate (PW and RLM) methods limited the model's performance, while univariate Spline and Stine and multivariate IM and ML methods didn't present significant limitations, except Spline for the block pattern. It is concluded that, future predictions accuracy depends, not only on a well-trained and validated model, but also on the appropriate use of missing data imputation methods.

Published in Revista Brasileira de Recursos Hídricos

ISSN: 2318-0331 (Online)
Publisher: Associação Brasileira de Recursos Hídricos
Country of publisher: Brazil
LCC subjects: Technology: Hydraulic engineering: River, lake, and water-supply engineering (General); Geography. Anthropology. Recreation: Environmental sciences
Website: http://www.abrh.org.br/RBRH

About the journal

Abstract

Keywords