Clustering Algorithm for High-Dimensional Data Under New Dimensionality Reduc-tion Criteria

WAN Jing, WU Fan, HE Yunbin, LI Song

doi:10.3778/j.issn.1673-9418.1902023

Jisuanji kexue yu tansuo (Jan 2020)

Clustering Algorithm for High-Dimensional Data Under New Dimensionality Reduc-tion Criteria

WAN Jing, WU Fan, HE Yunbin, LI Song

Affiliations

WAN Jing, WU Fan, HE Yunbin, LI Song: School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.1902023
Journal volume & issue: Vol. 14, no. 1
pp. 96 – 107

Abstract

Read online

In order to solve the problem that principal component analysis (PCA) algorithm can??t deal with the reduction of clustering accuracy after high dimensional data reduction, a new attribute space concept is proposed. Based on the combination of attribute space and information entropy, the dimensionality reduction standard based on feature similarity is constructed. A new dimensionality reduction algorithm (entropy-PCA, EN-PCA) is proposed. Aiming at the problem that the post-dimension feature is a linear combination of original features, which leads to poor interpretability and inflexible input, a sparse principal component algorithm based on ridge regression (ESPCA) is proposed. The input of ESPCA algorithm is the PCA dimension reduction result. It does not require iteration to obtain sparse results, which increases the flexibility and speed of solution. Finally, on the basis of dimensionality reduction data, initialization, selection, crossover, mutation and other operations are improved for the problem of slow convergence of genetic algorithm clustering, and a new clustering algorithm (genetic K-means algorithm ++, GKA++) is proposed. Experimental analysis shows that the EN-PCA algorithm is stable, and the GKA++ algorithm performs well in terms of clustering effectiveness and efficiency.

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords