Symbolic regression as a feature engineering method for machine and deep learning regression tasks

Assaf Shmuel; Oren Glickman; Teddy Lazebnik

doi:10.1088/2632-2153/ad513a

Machine Learning: Science and Technology (Jan 2024)

Symbolic regression as a feature engineering method for machine and deep learning regression tasks

Assaf Shmuel,
Oren Glickman,
Teddy Lazebnik

Affiliations

Assaf Shmuel: ORCiD; Department of Computer Science, Bar Ilan University , Ramat Gan, Israel
Oren Glickman: ORCiD; Department of Computer Science, Bar Ilan University , Ramat Gan, Israel
Teddy Lazebnik: ORCiD; Department of Mathematics, Ariel University , Ariel, Israel; Department of Cancer Biology, Cancer Institute, University College London , London, United Kingdom

DOI: https://doi.org/10.1088/2632-2153/ad513a
Journal volume & issue: Vol. 5, no. 2
p. 025065

Abstract

Read online

In the realm of machine and deep learning (DL) regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning (ML) models. In the context of DL models, the FE is embedded in the neural network’s architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a ML model to improve its performance. We show, through extensive experimentation on synthetic and 21 real-world datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and DL regression models with 34%–86% root mean square error (RMSE) improvement in synthetic datasets and 4%–11.5% improvement in real-world datasets. In an additional realistic use case, we show the proposed method improves the ML performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models, improving them in terms of performance and interpretability.

Published in Machine Learning: Science and Technology

ISSN: 2632-2153 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://iopscience.iop.org/journal/2632-2153

About the journal

Abstract

Keywords