Mathematics (Nov 2024)
Discovery of Exact Equations for Integer Sequences
Abstract
Equation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and, therefore, noisy, moving the focus of equation discovery algorithms towards discovering approximate equations. These loosely match the noisy observed data, rendering them inappropriate for applications in mathematics. In this article, we introduce Diofantos, an algorithm for discovering equations in the ring of integers that exactly match the training data. Diofantos is based on a reformulation of the equation discovery task into the task of solving linear Diophantine equations. We empirically evaluate the performance of Diofantos on reconstructing known equations for more than 27,000 sequences from the online encyclopedia of integer sequences, OEIS. Diofantos successfully reconstructs more than 90% of these equations and clearly outperforms SINDy, a state-of-the-art method for discovering approximate equations, that achieves a reconstruction rate of less than 70%.
Keywords