Applied Sciences (Nov 2024)

Bias and Variance Analysis of Contemporary Symbolic Regression Methods

  • Lukas Kammerer,
  • Gabriel Kronberger,
  • Stephan Winkler

DOI
https://doi.org/10.3390/app142311061
Journal volume & issue
Vol. 14, no. 23
p. 11061

Abstract

Read online

Symbolic regression is commonly used in domains where both high accuracy and interpretability of models is required. While symbolic regression is capable to produce highly accurate models, small changes in the training data might cause highly dissimilar solution. The implications in practice are huge, as interpretability as key-selling feature degrades when minor changes in data cause substantially different behavior of models. We analyse those perturbations caused by changes in training data for ten contemporary symbolic regression algorithms. We analyse existing machine learning models from the SRBench benchmark suite, a benchmark that compares the accuracy of several symbolic regression algorithms. We measure the bias and variance of algorithms and show how algorithms like Operon and GP-GOMEA return highly accurate models with similar behavior despite changes in training data. Our results highlight that larger model sizes do not imply different behavior when training data change. On the contrary, larger models effectively prevent systematic errors. We also show how other algorithms like ITEA or AIFeynman with the declared goal of producing consistent results meet up to their expectation of small and similar models.

Keywords