Feature engineering through two-level genetic algorithm

Aditi Gulati; Armin Felahatpisheh; Camilo E. Valderrama

doi:10.1016/j.mlwa.2025.100696

Machine Learning with Applications (Sep 2025)

Feature engineering through two-level genetic algorithm

Aditi Gulati,
Armin Felahatpisheh,
Camilo E. Valderrama

Affiliations

Aditi Gulati: Computer Science and Engineering Department, Indira Gandhi Delhi Technical University for Women, Delhi, 110006, India
Armin Felahatpisheh: Department of Applied Computer Science, University of Winnipeg, 515 Portage Avenue, Winnipeg, R3B 2E9, MB, Canada
Camilo E. Valderrama: Department of Applied Computer Science, University of Winnipeg, 515 Portage Avenue, Winnipeg, R3B 2E9, MB, Canada; Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, 3280 Hospital Drive NW, Calgary, T2N 4Z6, AB, Canada; Corresponding author at: Department of Applied Computer Science, University of Winnipeg, 515 Portage Avenue, Winnipeg, R3B 2E9, MB, Canada.

DOI: https://doi.org/10.1016/j.mlwa.2025.100696
Journal volume & issue: Vol. 21
p. 100696

Abstract

Read online

Deep learning models are widely used for their high predictive performance, but often lack interpretability. Traditional machine learning methods, such as logistic regression and ensemble models, offer greater interpretability but typically have lower predictive capacity. Feature engineering can enhance the performance of interpretable models by identifying features that optimize classification. However, existing feature engineering methods face limitations: (1) they usually do not apply non-linear transformations to features, ignoring the benefits of non-linear spaces; (2) they usually perform feature selection only once, failing to reduce uncertainty through repeated experiments; and (3) traditional methods like minimum redundancy maximum relevance (mRMR) require additional hyperparameters to define the number of selected features. To address these issues, this study proposed a hierarchical two-level feature engineering approach. In the first level, relevant features were identified using multiple bootstrapped training sets. For each training set, the features were expanded using seven non-linear transformation functions, and the minimum feature set maximizing ensemble model performance was selected using the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). In the second level, candidate feature sets were aggregated using two strategies. We evaluated our approach on twelve datasets from various fields, achieving an average F1 score improvement of 1.5% while reducing the feature set size by 54.5%. Moreover, our approach outperformed or matched traditional filter-based methods. Our approach is available through a Python library (feature-gen), enabling others to benefit from this tool. This study highlights the utility of evolutionary algorithms to generate feature sets that enhance the performance of interpretable machine learning models.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords