IEEE Access (Jan 2019)

A Meta-Learning Recommendation System for Characterizing Unsupervised Problems: On Using Quality Indices to Describe Data Conformations

  • Jose A. Saez,
  • Emilio Corchado

DOI
https://doi.org/10.1109/ACCESS.2019.2917004
Journal volume & issue
Vol. 7
pp. 63247 – 63263

Abstract

Read online

The clustering of a new unsupervised problem usually requires knowing both if the samples may be separable in different groups and the number of these groups. This information, which has a great impact on the results obtained, is generally unknown beforehand. A wide explored research line in the literature proposes to use some metrics, known as quality indices, to determine the number of clusters in a dataset. However, they may lead to variable results depending on the metric chosen. This research analyzes the usage of a novel meta-learning system for determining the number of clusters in unsupervised data, called Meta-Learning Recommendation System for Cluster Cardinality Estimation (MLRS-CCE). It is based on the idea of using quality metrics not as a solution to the problem, but as a means to characterize the inner structure of each dataset and employing this information to detect when unsupervised data is not uniform and suggest additional information about the number of clusters in the data. In order to achieve such goals a large collection of both real-world and synthetic datasets, in which the number of clusters is known a priori, are used to build the system and check its performance. The meta-learning system was successfully tested on such data, showing that it is accurate enough, both separating uniform data from non-uniform one and predicting cluster cardinality when it is compared to the results given by individual quality indices.

Keywords