IEEE Access (Jan 2021)
An Adaptive Density-Sensitive Similarity Measure Based Spectral Clustering Algorithm and Its Parallelization
Abstract
The clustering effect of the spectral clustering algorithm depends on the calculation of the similarity between samples. Although a better clustering effect of the spectral clustering algorithm can be obtained using the Gaussian kernel function to calculate the similarity between samples, it relies on the setting of the kernel parameter. Therefore, an adaptive density-sensitive similarity measure based spectral clustering (DSSC) algorithm is proposed for improving the clustering effect. Specifically, firstly, the Euclidean distances between samples are calculated to get the nearest neighbors of each sample. Secondly, the standard deviation of distances between each sample and its nearest neighbors is calculated as the density parameter. Thirdly, the density-sensitive distances between each sample and its nearest neighbors are calculated. Finally, the similarities between each sample and its nearest neighbors are calculated to construct a similarity matrix. In addition, the proposed DSSC algorithm is parallelized on Dask distributed parallel computing platform with CPU+GPU, which can improve the computational efficiency of the DSSC algorithm by taking full advantage of the CPU and GPU resources. A series of experiments are conducted to verify the effectiveness of the proposed DSSC algorithm on several synthetic datasets and UCI datasets, and the results show that the DSSC algorithm not only achieves satisfactory clustering results, but also obtains better efficiency of performing large-scale clustering analysis.
Keywords