IEEE Access (Jan 2019)

Automatic Fuzzy Clustering Using Non-Dominated Sorting Particle Swarm Optimization Algorithm for Categorical Data

  • Thi Phuong Quyen Nguyen,
  • R. J. Kuo

DOI
https://doi.org/10.1109/ACCESS.2019.2927593
Journal volume & issue
Vol. 7
pp. 99721 – 99734

Abstract

Read online

Categorical data clustering has been attracted a lot of attention recently due to its necessary in the real-world applications. Many clustering methods have been proposed for categorical data. However, most of the existing algorithms require the predefined number of clusters which is usually unavailable in real-world problems. Only a few works focused on automatic clustering, but mainly handled for numerical data. This study develops a novel automatic fuzzy clustering using non-dominated sorting particle swarm optimization (AFC-NSPSO) algorithm for categorical data. The proposed AFC-NSPSO algorithm can automatically identify the optimal number of clusters and exploit the clustering result with the corresponding selected number of clusters. In addition, a new technique is investigated to identify the maximum number of clusters in a dataset based on the local density. To select a final solution in the first Pareto front, some internal validation indices are used. The performance of the proposed AFC-NSPSO on the real-world datasets collected from the UCI machine learning repository exhibits effectiveness compared with some other existing automatic categorical clustering algorithms. Besides, this study also applies the proposed algorithm to analyze a real-world case study with an unknown number of clusters.

Keywords