Framework for the Ensemble of Feature Selection Methods

Maritza Mera-Gaona; Diego M. López; Rubiel Vargas-Canas; Ursula Neumann

doi:10.3390/app11178122

Applied Sciences (Sep 2021)

Framework for the Ensemble of Feature Selection Methods

Maritza Mera-Gaona,
Diego M. López,
Rubiel Vargas-Canas,
Ursula Neumann

Affiliations

Maritza Mera-Gaona: Faculty of Electronic Engineering and Telecommunications, Campus Tulcan, University of Cauca, Popayán 190001, Colombia
Diego M. López: Faculty of Electronic Engineering and Telecommunications, Campus Tulcan, University of Cauca, Popayán 190001, Colombia
Rubiel Vargas-Canas: Faculty of Electronic Engineering and Telecommunications, Campus Tulcan, University of Cauca, Popayán 190001, Colombia
Ursula Neumann: Group Data Science, Division Supply Chain Services SCS, Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits IIS, 90411 Nuremberg, Germany

DOI: https://doi.org/10.3390/app11178122
Journal volume & issue: Vol. 11, no. 17
p. 8122

Abstract

Read online

Feature selection (FS) has attracted the attention of many researchers in the last few years due to the increasing sizes of datasets, which contain hundreds or thousands of columns (features). Typically, not all columns represent relevant values. Consequently, the noise or irrelevant columns could confuse the algorithms, leading to a weak performance of machine learning models. Different FS algorithms have been proposed to analyze highly dimensional datasets and determine their subsets of relevant features to overcome this problem. However, very often, FS algorithms are biased by the data. Thus, methods for ensemble feature selection (EFS) algorithms have become an alternative to integrate the advantages of single FS algorithms and compensate for their disadvantages. The objective of this research is to propose a conceptual and implementation framework to understand the main concepts and relationships in the process of aggregating FS algorithms and to demonstrate how to address FS on datasets with high dimensionality. The proposed conceptual framework is validated by deriving an implementation framework, which incorporates a set of Phyton packages with functionalities to support the assembly of feature selection algorithms. The performance of the implementation framework was demonstrated in several experiments discovering relevant features in the Sonar, SPECTF, and WDBC datasets. The experiments contrasted the accuracy of two machine learning classifiers (decision tree and logistic regression), trained with subsets of features generated either by single FS algorithms or the set of features selected by the ensemble feature selection framework. We observed that for the three datasets used (Sonar, SPECTF, and WD), the highest precision percentages (86.95%, 74.73%, and 93.85%, respectively) were obtained when the classifiers were trained with the subset of features generated by our framework. Additionally, the stability of the feature sets generated using our ensemble method was evaluated. The results showed that the method achieved perfect stability for the three datasets used in the evaluation.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords