Entropy (Nov 2017)

Instance Selection for Classifier Performance Estimation in Meta Learning

  • Marcin Blachnik

DOI
https://doi.org/10.3390/e19110583
Journal volume & issue
Vol. 19, no. 11
p. 583

Abstract

Read online

Building an accurate prediction model is challenging and requires appropriate model selection. This process is very time consuming but can be accelerated with meta-learning–automatic model recommendation by estimating the performances of given prediction models without training them. Meta-learning utilizes metadata extracted from the dataset to effectively estimate the accuracy of the model in question. To achieve that goal, metadata descriptors must be gathered efficiently and must be informative to allow the precise estimation of prediction accuracy. In this paper, a new type of metadata descriptors is analyzed. These descriptors are based on the compression level obtained from the instance selection methods at the data-preprocessing stage. To verify their suitability, two types of experiments on real-world datasets have been conducted. In the first one, 11 instance selection methods were examined in order to validate the compression–accuracy relation for three classifiers: k-nearest neighbors (kNN), support vector machine (SVM), and random forest. From this analysis, two methods are recommended (instance-based learning type 2 (IB2), and edited nearest neighbor (ENN)) which are then compared with the state-of-the-art metaset descriptors. The obtained results confirm that the two suggested compression-based meta-features help to predict accuracy of the base model much more accurately than the state-of-the-art solution.

Keywords