Scientific Reports (Jun 2022)

Robust model selection using the out-of-bag bootstrap in linear regression

  • Fazli Rabbi,
  • Alamgir Khalil,
  • Ilyas Khan,
  • Muqrin A. Almuqrin,
  • Umair Khalil,
  • Mulugeta Andualem

DOI
https://doi.org/10.1038/s41598-022-14398-1
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Outlying observations have a large influence on the linear model selection process. In this article, we present a novel approach to robust model selection in linear regression to accommodate the situations where outliers are present in the data. The model selection criterion is based on two components, the robust conditional expected prediction loss, and a robust goodness-of-fit with a penalty term. We estimate the conditional expected prediction loss by using the out-of-bag stratified bootstrap approach. In the presence of outliers, the stratified bootstrap ensures that we obtain bootstrap samples that are similar to the original sample data. Furthermore, to control the undue effect of outliers, we use the robust MM-estimator and a bounded loss function in the proposed criterion. Specifically, we observe that instead of minimizing the penalized loss function or the conditional expected prediction loss separately, it is better to minimize them simultaneously. The simulation and real-data based studies confirm the consistent and satisfactory behavior of our bootstrap model selection procedure in the presence of response outliers and covariate outliers.