Frontiers in Energy Research (Jul 2021)

A Flexible Ensemble Algorithm for Big Data Cleaning of PMUs

  • Long Shen,
  • Xin He,
  • Mingqun Liu,
  • Risheng Qin,
  • Cheng Guo,
  • Xian Meng,
  • Ruimin Duan

DOI
https://doi.org/10.3389/fenrg.2021.695057
Journal volume & issue
Vol. 9

Abstract

Read online

With an increasing application of Phase Measurement Units in the smart grid, it is becoming inevitable for PMUs to operate in severe conditions, which results in outliers and missing data. However, conventional techniques take excessive time to clean outliers and fill missing data due to lacking support from a big data platform. In this paper, a flexible ensemble algorithm is proposed to implement a precise and scalable data clean by the existing big data platform “Apache Spark.” In the proposed scheme, an ensemble model based on a soft voting approach utilizes principal component analysis in conjunction with the K-means, Gaussian mixture model, and isolation forest technique to detect outliers. The proposed scheme uses a gradient boosting decision tree for each extracted feature of PMUs for the data filling process after detecting outliers. The test results demonstrate that the proposed model achieves high accuracy and recall by comparing simulated and real-world Phase measurement unit data using the local outlier factor algorithm and Density-Based Spatial Clustering of Application with Noise (DBSCAN). The mean absolute error, root mean square error and R2-score criteria are used to validate the proposed method’s data filling results against contemporary techniques such as decision tree and linear regression algorithms.

Keywords