IEEE Access (Jan 2024)

Graph Distance and Adaptive K-Nearest Neighbors Selection-Based Density Peak Clustering

  • Yuqin Sun,
  • Jingcong Wang,
  • Yuan Sun,
  • Pengcheng Zhang,
  • Tianyi Wang

DOI
https://doi.org/10.1109/ACCESS.2024.3403128
Journal volume & issue
Vol. 12
pp. 71783 – 71796

Abstract

Read online

Density Peak Clustering (DPC) is known for its rapid identification of cluster centers and successful clustering tasks. However, traditional DPC encounters several issues, which include simplifications in local density and distance metrics, a non-robust single-allocation strategy and limited fault tolerance. To address these challenges, this study introduces an innovative density peak clustering algorithm, named Graph Distance and Adaptive K-Nearest Neighbors Selection-Based Density Peak Clustering (GAK-DPC). Our goal with the approach is to enhance the algorithm’s adaptability to non-linear and complex data structures. We achieve this by replacing the traditional Euclidean distance with graph distance. Additionally, we redefine the method for computing local density based on information from K-nearest neighbor data points. By introducing the concept of natural neighbors, the neighborhood radius r is obtained when all instances in the dataset have at least one natural neighbor. Then for the current data point, the number of data points falling within a circle centered on it with radius r is counted as the K-value of that data point. Thus, we achieve the adaptive selection of the K-value. This adaptive K-value strategy takes into account the dataset’s characteristics and inter-point neighbor relationships, which enhances the algorithm’s adaptability and robustness. Finally, we optimize the secondary allocation strategy for sample points to improve the algorithm’s fault tolerance. By conducting comparisons with traditional clustering algorithms on UCI datasets and synthetic datasets, we demonstrate the effectiveness of GAK-DPC.

Keywords