Proceedings of the XXth Conference of Open Innovations Association FRUCT (May 2023)

Recommending Machine Learning Pipelines Based on Cumulative Metadata

  • Maxim Aliev,
  • Sergey B Muravyov

DOI
https://doi.org/10.5281/zenodo.8004565
Journal volume & issue
Vol. 33, no. 2
pp. 331 – 334

Abstract

Read online

The problem of automated machine learning pipeline design for a given supervised learning task is usually solved by various optimization methods. However, this entails high time complexity. There is a solution called meta-learning, which consists in training a certain model with metadata of the results of solving similar problems. Nevertheless, this approach also has a limitation: the need for a large amount of knowledge to achieve high efficiency of the model. Based on the literature analyzed by the authors, this problem still remains relevant. In particular, auto- sklearn, one of the state-of-the-art solutions, uses a set of metadata that is predetermined and does not change based on new run results. The ontological data model proposed by the authors, together with the mechanism of automated knowledge enrichment, are designed to reduce the impact of the above restriction. Currently, the pipeline recommendation process includes two scenarios: the scenario of having a hash representation of the original data set in storage; the reverse scenario, in which the pipeline is recommended based on Bayesian optimization over the global space of machine learning algorithms and their associate hyper-parameters. As part of the experiment, the pipeline inference time was measured for both scenarios. The results confirmed the superiority of the metadata- driven recommendation and the increase in this advantage as the dimension of the input data increased.

Keywords