AutoML with Bayesian Optimizations for Big Data Management

Aristeidis Karras; Christos Karras; Nikolaos Schizas; Markos Avlonitis; Spyros Sioutas

doi:10.3390/info14040223

Information (Apr 2023)

AutoML with Bayesian Optimizations for Big Data Management

Aristeidis Karras,
Christos Karras,
Nikolaos Schizas,
Markos Avlonitis,
Spyros Sioutas

Affiliations

Aristeidis Karras: Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
Christos Karras: Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
Nikolaos Schizas: Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
Markos Avlonitis: Department of Informatics, Ionian University, 49100 Kerkira, Greece
Spyros Sioutas: Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece

DOI: https://doi.org/10.3390/info14040223
Journal volume & issue: Vol. 14, no. 4
p. 223

Abstract

Read online

The field of automated machine learning (AutoML) has gained significant attention in recent years due to its ability to automate the process of building and optimizing machine learning models. However, the increasing amount of big data being generated has presented new challenges for AutoML systems in terms of big data management. In this paper, we introduce Fabolas and learning curve extrapolation as two methods for accelerating hyperparameter optimization. Four methods for quickening training were presented including Bag of Little Bootstraps, k-means clustering for Support Vector Machines, subsample size selection for gradient descent, and subsampling for logistic regression. Additionally, we also discuss the use of Markov Chain Monte Carlo (MCMC) methods and other stochastic optimization techniques to improve the efficiency of AutoML systems in managing big data. These methods enhance various facets of the training process, making it feasible to combine them in diverse ways to gain further speedups. We review several combinations that have potential and provide a comprehensive understanding of the current state of AutoML and its potential for managing big data in various industries. Furthermore, we also mention the importance of parallel computing and distributed systems to improve the scalability of the AutoML systems while working with big data.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords