IEEE Access (Jan 2018)

A Generalized Clustering Method Based on Validity Indices and Membership Functions

  • Edwin Aldana-Bobadilla,
  • Ivan Lopez-Arevalo,
  • Hiram Galeana-Zapien,
  • Melesio Crespo-Sanchez

DOI
https://doi.org/10.1109/ACCESS.2018.2882408
Journal volume & issue
Vol. 6
pp. 75912 – 75923

Abstract

Read online

Clustering is an important task in data analysis to find a partition on an unlabeled dataset based on similarity relationships among its elements. Typically, such similarity is determined by a proximity measure or distance. Then, the optimal partition is the one that minimizes the distance among elements belonging to the same subset and maximizes the distance among elements from different subsets. The way in which the optimal partition is found is called clustering method. The adequateness of the partition found is commonly determined in terms of a validity index. In this paper, we propose a clustering method referred to as quality-driven search for optimal partition (QDSOC) where the search process of the optimal partition is directly driven by a validity index instead of a proximity measure. Our approach allows to efficiently exploring a large solution space via a breed of genetic algorithm, the so-called eclectic genetic algorithm. Unlike existing clustering methods, the proposed QDSOC offers the optimal partition and provides the mathematical model of such partition in terms of a representation based on membership functions. This model describes the points that belong to the subsets in the partition found. Thus, by using this model, we can predict the membership of new objects without performing the search process again. As part of the experimental evaluation, our proposed QDSOC method is compared with k-means and self-organizing maps (SOMs), which are two well-known clustering approaches. The clustering methods were used to solve a wide sample of clustering problems, and using three different validity indices. From the obtained results, we demonstrate that QDSOC statistically outperforms k-means and SOMs. We also point out that our approach does not incur in excessive computational overhead with respect to such traditional clustering methods.

Keywords