IEEE Access (Jan 2022)
Optimized Feature Selection Based on a Least-Redundant and Highest-Relevant Framework for a Solar Irradiance Forecasting Model
Abstract
Exogenous and endogenous variables are typically evaluated several times during the selection trial of a predictive model for Global Horizontal Irradiance (GHI). This is accomplished using various statistical measures (e.g., univariate statistical analysis, correlation analysis, etc.) that are applied to gauge redundancy and relevancy in specific variables. The main benefits of these approaches include lower computational cost, fast screening times, accurate measuring of linear and monotonic degrees of variable pairs, and the removal of features with low relevance. However, they cannot identify instances where single or groups of predictor variables are non-monotonically associated with the response variable, nor can they discern whether variables are predictive in combination with other variables or in isolation. The present study attempts to overcome these challenges by first describing monotonic and non-monotonic (Spearman’s rho and Hoeffding’s D, respectively) correlation statistics in combined usage for locating groups with major non-monotonic endogenous variable changes. The proposed work’s novelty is subset evaluation that determines relevance using Weather Recursive Feature Elimination (WRFE). This is a novel hybrid feature reduction method that optimizes feature selection using a Least-Redundant/Highest-Relevant framework. The proposed WRFE utilizes feature importance for measuring variance reduction in Random Forest Regression (RFR) and as data perturbation in Long Short-Term Memory (LSTM). The simulation results of GHI hourly predictions demonstrate that the proposed optimal features of the training subset make the greatest contributions to the prediction target, proving that the high variability of irradiance conditions lowers training subset reliability.The results showed that the proposed WRFE is superior compared to the other models with 1.0927 % for the RMSE and the $R^{2}$ coefficient is exceeding 98%.
Keywords