Tehnički Vjesnik (Jan 2020)

Clustering Algorithm Based on Sparse Feature Vector without Specifying Parameter

  • Huixia He,
  • Guiying Wei,
  • Sen Wu*,
  • Xiaonan Gao

DOI
https://doi.org/10.17559/TV-20200918143701
Journal volume & issue
Vol. 27, no. 6
pp. 1974 – 1981

Abstract

Read online

Parameter setting is an essential factor affecting algorithm performance in data mining techniques. CABOSFV is an efficient clustering algorithm which can cluster binary data with sparse features, but it is challenging to specify the threshold parameter. To solve the difficulty of parameter decision, a clustering algorithm based on sparse feature vector without specifying parameter (CASP) is proposed in this paper. The calculation method of an upper limit of threshold is firstly defined to determine the range of threshold. Furthermore, we use the sparseness index to sort the data and conduct the clustering process based on the adjusted sparse feature vector after data sorting. An interval search strategy is adopted to find a suitable threshold within the defined threshold range, and the clustering result with the selected suitable parameter is the outcome. Experiments on 7 UCI datasets demonstrate that the clustering results of the CASP algorithm are superior to other baselines in terms of both effectiveness and efficiency. CASP not only simplifies the parameter decision process, but also obtains desirable clustering results quickly and stably, which shows the practicability of the algorithm.

Keywords