IEEE Access (Jan 2020)
Feature Selection Based on Random Forest for Partial Discharges Characteristic Set
Abstract
Since the dimension of combined feature set for partial discharge (PD) pattern recognition is higher, the corresponding sample size increases, as does the required amount of storage space and calculation, and there are features with less category-related characteristics in the feature parameters, which may contain redundant information between them. To solve the problem of higher feature dimension and complicated classification model required for the identification of partial discharge insulation defect type in this paper. Random forest sequential forward selection method based on variance analysis (RF-VA) is proposed for the optimal subset selection. This method is improved in two aspects. Firstly, a method based on variance analysis is proposed, which measures feature differences between categories, and obtains a modified arrangement displacement scheme to guide rearrangement of the order of values taken on data sample out of bag. Secondly, the sequence forward search method used to do feature selection could get iteration evaluation results, which solves randomness to determine the size of feature subset and instability of the results existing in the original algorithm. The results show RF-VA can obtain a better subset of features. It is feasible to reduce the dimension of partial discharge characteristic set, and effectively improve the identification rate of partial discharge defect type.
Keywords