IEEE Access (Jan 2024)

Incorporating Seasonal Features in Data Imputation Methods for Power Demand Time Series

  • Dmitrii Vasenin,
  • Marco Pasetti,
  • Davide Astolfi,
  • Nikita Savvin,
  • Stefano Rinaldi,
  • Alberto Berizzi

DOI
https://doi.org/10.1109/ACCESS.2024.3434652
Journal volume & issue
Vol. 12
pp. 103520 – 103536

Abstract

Read online

This paper addresses the critical issue of missing data in power demand time series by emphasizing the relevance of imputation-based approaches in data-driven technologies. A comparative analysis of imputation methods is performed, where the reference from the state of the art is selected as K-Nearest Neighbors (KNN) applied in the time domain. Two innovative methods are proposed. The former method is defined as Historical Data Informed Regression Technique (H-DIRT) and is based on incorporating historical data for setting up a multivariate linear regression and then imputing through the estimated relation between the missing power demand measurement and the historical data. When the available historical data are insufficient, the algorithm proceeds by averaging or by a linear interpolation between the first available measurement before and after the missing value. The latter proposed method is defined as Seasonal KNN (SKNN) and it is based on enriching the data set with features related to yearly, seasonal, weekly and daily trends and then proceeding by baseline KNN. Experiments are set up with random and continuous data clipping, even with rather extreme pruning (up 70% of the data). The results in general demonstrate a significant improvement in imputation accuracy compared to the state of the art. The average error metrics (like Mean Absolute Error and Root Mean Square Error) for the SKNN method are in the order of respectively one third and one half those of the baseline KNN, in the cases of random and continuous data clipping. In general, the SKNN method provides more accurate results and better captures the statistical features of the data set to impute. Anyway, if the share of data to impute is not too large, the H-DIRT method provides comparable accuracy at a much lower computational cost. Hence, this study presents an easily implementable and computationally affordable approach for improving, in various contexts, the state of the art in power demand data imputation. It establishes a foundation for future exploration into trends, seasonal factors, and external variables influencing power load parameters.

Keywords