Advances in Electrical and Computer Engineering (Aug 2015)
Evaluation of Subspace Clustering Using Internal Validity Measures
Abstract
Different clustering algorithms, or even the same algorithm with different input parameters, can produce different data partitioning. Then, clustering validity measures are applied in order to determine which results have better quality than others. External measures can be used for evaluation of clustering algorithms on datasets with known data division. However, in a real scenario such information is not available, and here internal measures are often applied. Subspace clustering techniques can create clusters which utilise different subsets of the full feature space. From this reason, a calculation of internal measures using the full feature space distance metrics (e.g., Euclidean distance) is not justified. In this paper, we propose a novel approach to subspace clustering evaluation with internal quality measures, i.e., we apply distance metrics that are able to handle missing attribute values or are used in dimensionality reduction techniques. Our approach is verified on eight publicly available, widely-used datasets. Obtained results are promising and allow recommending proposed distance metrics to be suitable for calculation of examined internal validation measures.
Keywords