IEEE Access (Jan 2024)

SM-DPC: Clustering by Fast Search and Find of Density Peaks Based on SNN With Multi-Cluster Fusion Strategy

  • Shibo Zhou,
  • Bingbing Peng,
  • Wenpeng Xu,
  • Luzhen Ren

DOI
https://doi.org/10.1109/ACCESS.2024.3404917
Journal volume & issue
Vol. 12
pp. 76413 – 76431

Abstract

Read online

The Clustering by Fast Search and Find of Density Peaks (DPC) algorithm is a clustering method that automatically identifies clustering centers based on density and relative distance. It has several advantages, including the ability to identify arbitrarily shaped clusters and requiring few input parameters. However, the density measure used in DPC does not consider the spatial distribution characteristics of the sample points in the data set. The clustering performance is suboptimal for datasets with significant differences in cluster density. Additionally, its non-central sample point assignment method is less error-tolerant, which can result in successive assignment errors and a domino effect, ultimately leading to poor clustering accuracy. To address these shortcomings, we propose an improved DPC algorithm based on shared nearest neighbor and multi-cluster fusion (SM-DPC). The local density of sample points is redefined using K-nearest neighbors, which makes the density metric more consistent with the local structural characteristics of the dataset. A two-step allocation strategy for non-central sample points based on shared nearest neighbors is proposed to improve the accuracy of allocation of non-central sample points. A multi-cluster fusion strategy is used to correct the centroid selection bias for datasets where sample points are not uniformly distributed. The experimental results demonstrate that SM-DPC is capable of clustering datasets with arbitrary shape and density distributions effectively. Furthermore, it exhibits superior performance and broader adaptability to different types of datasets compared to DBSCAN, K-means algorithms, and other DPC optimization algorithms.

Keywords