PeerJ Computer Science (Aug 2023)

A comparative study of different variable selection methods based on numerical simulation and empirical analysis

  • Dake Hou,
  • Wenli Zhou,
  • Qiuxia Zhang,
  • Kun Zhang,
  • Jiaqi Fang

DOI
https://doi.org/10.7717/peerj-cs.1522
Journal volume & issue
Vol. 9
p. e1522

Abstract

Read online Read online

This study employs the principles of computer science and statistics to evaluate the efficacy of the linear random effect model, utilizing Lasso variable selection techniques (including Lasso, Elastic-Net, Adaptive-Lasso, and SCAD) through numerical simulation and empirical research. The analysis focuses on the model’s consistency in variable selection, prediction accuracy, stability, and efficiency. This study employs a novel approach to assess the consistency of variable selection across models. Specifically, the angle between the actual coefficient vector β and the estimated coefficient vector $\hat {\beta }$ β ˆ is computed to determine the degree of consistency. Additionally, the boxplot tool of statistical analysis is utilized to visually represent the distribution of model prediction accuracy data and variable selection consistency. The comparative stability of each model is assessed based on the frequency of outliers. This study conducts comparative experiments of numerical simulation to evaluate a proposed model evaluation method against commonly used analysis methods. The results demonstrate the effectiveness and correctness of the proposed method, highlighting its ability to conveniently analyze the stability and efficiency of each fitting model.

Keywords