Jisuanji kexue yu tansuo (Jan 2020)

Clustering Algorithm for High-Dimensional Data Under New Dimensionality Reduc-tion Criteria

  • WAN Jing, WU Fan, HE Yunbin, LI Song

DOI
https://doi.org/10.3778/j.issn.1673-9418.1902023
Journal volume & issue
Vol. 14, no. 1
pp. 96 – 107

Abstract

Read online

In order to solve the problem that principal component analysis (PCA) algorithm can??t deal with the reduction of clustering accuracy after high dimensional data reduction, a new attribute space concept is proposed. Based on the combination of attribute space and information entropy, the dimensionality reduction standard based on feature similarity is constructed. A new dimensionality reduction algorithm (entropy-PCA, EN-PCA) is proposed. Aiming at the problem that the post-dimension feature is a linear combination of original features, which leads to poor interpretability and inflexible input, a sparse principal component algorithm based on ridge regression (ESPCA) is proposed. The input of ESPCA algorithm is the PCA dimension reduction result. It does not require iteration to obtain sparse results, which increases the flexibility and speed of solution. Finally, on the basis of dimensionality reduction data, initialization, selection, crossover, mutation and other operations are improved for the problem of slow convergence of genetic algorithm clustering, and a new clustering algorithm (genetic K-means algorithm ++, GKA++) is proposed. Experimental analysis shows that the EN-PCA algorithm is stable, and the GKA++ algorithm performs well in terms of clustering effectiveness and efficiency.

Keywords