Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis

Moïse Manyol; Samuel Eke; Alphonse J. M. Massoma; Alain Biboum; Ruben Mouangue

doi:10.1155/2022/8546588

International Transactions on Electrical Energy Systems (Jan 2022)

Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis

Moïse Manyol,
Samuel Eke,
Alphonse J. M. Massoma,
Alain Biboum,
Ruben Mouangue

Affiliations

Moïse Manyol: Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)
Samuel Eke: Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)
Alphonse J. M. Massoma: Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)
Alain Biboum: Mechanical and Industrial Engineering Department
Ruben Mouangue: Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)

DOI: https://doi.org/10.1155/2022/8546588
Journal volume & issue: Vol. 2022

Abstract

Read online

The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with outliers, redundant, and even more missing values. Missing data and outliers are ubiquitous in our databases, and imputation techniques will help us mitigate their influence. To solve this problem, as well as the problem of data size, this paper proposes a data preprocessing approach based on the k-nearest neighbor (KNN) completion for imputation of missing data and principal component analysis (PCA) for processing redundant data, thus reducing the data size by generating a significant quality sample after imputation of missing and outlier data. A rigorous comparison is made between our approach and two others. The dissolved gas data from Rio Tinto Alcan’s transformer T0001 were imputed by KNN, where k equals 5. For 6 imputed gases, the average percentage error is about 2%, 17.5% after average imputation, and 23.65% after multiple imputations. For data compression, 2 axes were selected based on the elbow rule and the Kaiser threshold.

Published in International Transactions on Electrical Energy Systems

ISSN: 2050-7038 (Online)
Publisher: Hindawi-Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://onlinelibrary.wiley.com/journal/itees

About the journal