IEEE Access (Jan 2020)

An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models

  • Yanxi Xie,
  • Yuewen Li,
  • Zhijie Xia,
  • Ruixia Yan

DOI
https://doi.org/10.1109/ACCESS.2020.3009377
Journal volume & issue
Vol. 8
pp. 129032 – 129042

Abstract

Read online

Variable selection plays an important role in various fields, such as process modeling and process monitoring. It generally involves a large number of predictor variables, usually with the number of predictor variables d much larger than the sample size n. Therefore, how to filter useful variables and extract useful information in high-dimensional setup is a critical issue in the era of big data. This paper proposes an improved Forward Regression algorithm for variable selection under the high-dimensional setup. The proposed improved Forward Regression method demonstrates good performance in relevant-variable selection by introducing a predefined stopping rule. The stopping rule links the residual sum of squares to the noise ratio so that the relevant predictors can be distinguished from the random noises. Throughout theoretical analysis and simulations, it is confirmed that the improved Forward Regression algorithm can identify relevant predictors to ensure selection consistency in variable selection. Compared with the traditional Forward Regression method, the proposed Forward Regression algorithm can improve prediction accuracy and reduce computational cost by selecting only the relevant variables.

Keywords