IEEE Access (Jan 2021)

Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression

  • Fernando Jimenez,
  • Estrella Lucena-Sanchez,
  • Gracia Sanchez,
  • Guido Sciavicco

DOI
https://doi.org/10.1109/ACCESS.2021.3115848
Journal volume & issue
Vol. 9
pp. 135675 – 135688

Abstract

Read online

When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models.

Keywords