IEEE Access (Jan 2023)

Comparing Automated Machine Learning Against an Off-the-Shelf Pattern-Based Classifier in a Class Imbalance Problem: Predicting University Dropout

  • Leonardo Canete-Sifuentes,
  • Victor Robles,
  • Ernestina Menasalvas,
  • Raul Monroy

DOI
https://doi.org/10.1109/ACCESS.2023.3336596
Journal volume & issue
Vol. 11
pp. 139147 – 139156

Abstract

Read online

When facing a classification problem, data science practitioners must search through an armory of methods. Often, practitioners are tempted to use off-the-shelf classifiers, including automated Machine Learning (AutoML) toolboxes; however, stand-alone classifiers are not applicable to every problem and AutoML may be time-consuming raising up environment-ethical issues. To magnify the problem, (commercial) AutoML toolboxes are black and practitioners are not allowed to extend them with new methods to improve their classification performance. Our main objective is to show that an off-the-shelf classifier designed for class imbalance problems can achieve similar performance to an AutoML toolbox. To do so, first, we present the student dropout prediction case study, which most off-the-shelf classifiers find difficult to solve due to the problem’s inherent class imbalance. We show that Microsoft Azure AutoML outperforms several popular, stand-alone classifiers. However, multivariate PBC4cip, an off-the-shelf classifier especially designed to deal with class imbalance, yields results that are just as good as Microsoft Azure AutoML, with the advantage that the expensive steps of mechanism selection and tuning are avoided. Our studies show that data science practitioners need to build themselves a taxonomy of classification mechanisms in terms of the properties of the problem to solve. Additionally, AutoML platforms should let scientists modify the armory of classifiers and provide an explanation of both mechanism selection and mechanism tunning so that practitioners learn further lessons.

Keywords