IEEE Access (Jan 2024)
Arborescent Orthogonal Least Squares Regression for NARMAX-Based Black-Box Fitting
Abstract
This paper proposes a linear algebra-based supervised machine learning algorithm for the symbolic representation of arbitrarily non-linear and recursive systems. It introduces multiple extensions to the algorithmic class of “Forward Orthogonal Least Squares Regressions” (FOrLSR), which performs dictionary-based sparse symbolic regressions. The regression, being only provided with the system’s input and output, performs variable combinations and non-linear transformations from a given dictionary of analytic expressions and selects the optimal ones to represent the unknown system. This yields a “symbolic” system representation, having the minimum number of terms to enforce sparsity ( $L_{0}$ -norm), while keeping the highest possible precision. The first proposed algorithm (rFOrLSR) restructures the FOrLSR to be in matrix form (for large scale GPU and BLAS-like optimizations), recursive (to reduce the computational complexity from quadratic in model length to linear) and allows regressors to be imposed (to include user expertise and perform tree-searches). Furthermore, the dictionary search is restructured into a breadth-first arborescence traversal kept sparse by five proposed theorems, four corollaries and one pruning mechanism, while adding a validation procedure for the final model selection. The proposed arborescence (AOrLSR) scans large search-space segments, significantly increasing the probability of finding an optimal system representation, while only computing a marginal fraction of the search-space. The regression and arborescence are solvers for arbitrarily-determined linear equation systems which maximize sparsity in the solution vectors.
Keywords