IEEE Access (Jan 2021)

Novel Meta-Features for Automated Machine Learning Model Selection in Anomaly Detection

  • Milos Kotlar,
  • Marija Punt,
  • Zaharije Radivojevic,
  • Milos Cvetanovic,
  • Veljko Milutinovic

DOI
https://doi.org/10.1109/ACCESS.2021.3090936
Journal volume & issue
Vol. 9
pp. 89675 – 89687

Abstract

Read online

A growing number of research papers shed light on automated machine learning (AutoML) frameworks, which are becoming a promising solution for building complex machine learning models without human expertise and assistance. The key challenge in enabling AutoML frameworks to build an efficient model for anomaly detection tasks is to determine the best underlying model for a given task and optimization metric. The meta-learning approaches based on a set of meta-features that describes data properties can enable efficient model selection in AutoML frameworks. The existing meta-learning approaches based on statistical and information-theoretic meta-features require large amounts of data and computational resources to extract data properties. This paper proposes a novel set of meta-features for model selection in anomaly detection tasks based on domain-specific properties of data which overcomes the shortcomings of existing meta-features by introducing simple but effective meta-features that can be efficiently extracted or estimated by using a low amount of data. Experiments with 63 datasets from different repositories with varying schemas show that the proposed set of meta-features achieves an accuracy of 87% for model selection, while the achieved accuracy for simple meta-features is 74%, for statistical meta-features 68%, for information theory meta-feature 70%, and for a comprehensive set of meta-features by pyMFE 73%. This demonstrates that the proposed set can be adopted by AutoML frameworks across a diverse range of domains.

Keywords