IEEE Access (Jan 2020)
An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models
Abstract
Variable selection plays an important role in various fields, such as process modeling and process monitoring. It generally involves a large number of predictor variables, usually with the number of predictor variables d much larger than the sample size n. Therefore, how to filter useful variables and extract useful information in high-dimensional setup is a critical issue in the era of big data. This paper proposes an improved Forward Regression algorithm for variable selection under the high-dimensional setup. The proposed improved Forward Regression method demonstrates good performance in relevant-variable selection by introducing a predefined stopping rule. The stopping rule links the residual sum of squares to the noise ratio so that the relevant predictors can be distinguished from the random noises. Throughout theoretical analysis and simulations, it is confirmed that the improved Forward Regression algorithm can identify relevant predictors to ensure selection consistency in variable selection. Compared with the traditional Forward Regression method, the proposed Forward Regression algorithm can improve prediction accuracy and reduce computational cost by selecting only the relevant variables.
Keywords