IEEE Access (Jan 2022)

A Novel Cluster Prediction Approach Based on Locality-Sensitive Hashing for Fuzzy Clustering of Categorical Data

  • Toan Nguyen Mau,
  • Yasushi Inoguchi,
  • Van-Nam Huynh

DOI
https://doi.org/10.1109/ACCESS.2022.3162690
Journal volume & issue
Vol. 10
pp. 34196 – 34206

Abstract

Read online

This paper addresses the problem of fuzzy clustering for categorical data. During the last two decades, many attempts have been made to extend the $k$ -means algorithm, making it applicable to clustering for categorical data, due to its simplicity and efficiency. However, as $k$ -means-like algorithms are local optimization methods, their clustering results are highly sensitive to initialization. In this paper, we propose to use Locality-Sensitive Hashing (LSH) to reduce the categorical data dimensions and predict the initial fuzzy clusters in low-dimensional space. Different from the existing cluster initialization optimization methods that aim to create only crisp initial clusters, the proposed method aims at predicting ‘high quality’ fuzzy clusters at the initialization step before proceeding in the $k$ -means-like fashion. The numerical results show that the proposed method yields relatively accurate results on 16 UCI datasets and outperforms all other related approaches in terms of both crisp and fuzzy clustering effectiveness.

Keywords