Feature Selection Combining Artificial Bee Colony with [K-means] Clustering

SUN Lin, LIU Menghan, XUE Zhan’ao

doi:10.3778/j.issn.1673-9418.2212075

Jisuanji kexue yu tansuo (Jan 2024)

Feature Selection Combining Artificial Bee Colony with [K-means] Clustering

SUN Lin, LIU Menghan, XUE Zhan’ao

Affiliations

SUN Lin, LIU Menghan, XUE Zhan’ao: 1. College of Artificial Intelligence, Tianjin University of Science and Technology, Tianjin 300457, China 2. School of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan 453007, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2212075
Journal volume & issue: Vol. 18, no. 1
pp. 93 – 110

Abstract

Read online

K-means clustering is a simple and efficient, fast in convergence and easy to implement statistical analysis method. However, the traditional [K-means] clustering algorithm is sensitive to the selection of initial clustering centers and easy to fall into a local optimum, and at the same time, most unsupervised feature selection algorithms are easy to ignore the relationship between features. To solve the above issues, this paper proposes a feature selection algorithm combining artificial bee colony with [K-means] clustering. Firstly, to make the similarity of samples in the same cluster high and the similarity of the samples in different clusters low, a new fitness function is constructed based on the clustering degree within the cluster and the dispersion degree between the clusters, which can better reflect the characteristics of each sample, and then a new probability expression of the honey source being selected is constructed. Secondly, the weight which decreases gradually with the increase of the number of iterations is designed, and the honey source location update expression that makes the search range of the bee colony dynamically indent is proposed. Thirdly, to make up for the limitation of the traditional Euclidean distance which only considers the cumulative difference between vectors when calculating the distance, a weighted Euclidean distance expression which simultaneously considers both the different influence degrees of the samples and the similarity of the samples is constructed. Finally, the standard deviation and distance correlation coefficient are introduced to define feature discrimination and feature representativeness, and the product of them is used to measure the importance of features. Experimental results show that the proposed algorithm accelerates the convergence speed of artificial bee colony algorithm and improves the clustering effect of [K-means] algorithm, and also effectively improves the classification effect of feature selection.

feature selection; artificial bee colony; [k-means] clustering; feature importance

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords