IEEE Access (Jan 2022)

Random Kernel Forests

  • Dmitry A. Devyatkin,
  • Oleg G. Grigoriev

DOI
https://doi.org/10.1109/ACCESS.2022.3193385
Journal volume & issue
Vol. 10
pp. 77962 – 77979

Abstract

Read online

Random forests of axis-parallel decision trees still show competitive accuracy in various tasks; however, they have drawbacks that limit their applicability. Namely, they perform poorly for multidimensional sparse data. A straightforward solution is to create forests of decision trees with oblique splits; however, most training approaches have low performance. Besides, those ensembles appear unstable and easily overfit, so they should be regularized to find the trade-off between complexity and generalization. This paper proposes an algorithm to train kernel decision trees and random forests. At the top level, this algorithm follows a common greedy procedure to train decision trees; however, it produces quasi-optimal oblique and kernel splits at the decision stump level. At each stump, the algorithm finds a quasi-optimal distribution of classes to subtrees and trains this stump via optimization of an SVM-like loss function with a margin re-scaling approach, which helps optimize the margin between subtree data and arbitrary impurity criteria. We also try to reveal uniform stability-based generalization bounds for those forests and apply them to select the regularization technique. The obtained bounds explicitly consider primary hyperparameters of forests, trees, and decision stumps. The bounds also show a relationship between outliers in decision tree structure, stability and generalization, and illustrate how forests smooth these outliers. We performed an experimental evaluation on several tasks, such as studying the reaction of social media users, image recognition, and bank scoring. The experiments show that the proposed algorithm trains ensembles, which outperform other oblique or kernel forests in many commonly-recognized datasets. Namely, the proposed algorithm shows 99.1% accuracy on MNIST and 58.1% on CIFAR-10. It has been confirmed that the selected regularization technique helps reduce overfitting on several datasets. Therefore, the proposed algorithm may be considered a small step toward customized and regularized ensembles of kernel trees that keep reasonable training performance on large datasets. We believe that the proposed algorithm can be utilized as a construction element of approaches to training context-aware models on complex problems with integer inputs and outputs, such as text mining and image recognition.

Keywords