Jisuanji kexue (Jan 2023)

Survey on Hierarchical Clustering for Machine Learning

  • WANG Shaojiang, LIU Jia, ZHENG Feng, PAN Yicheng

DOI
https://doi.org/10.11896/jsjkx.211000185
Journal volume & issue
Vol. 50, no. 1
pp. 9 – 17

Abstract

Read online

Clustering analysis plays a key role in machine learning,data mining and biological DNA information.Clustering algorithms can be categorized into flat clustering and hierarchical clustering.Flat clustering mostly divides the data set into K parallel communities without intersections,but the real communities have multi-level inclusion relations,so the hierarchical clustering algorithms can provide more elaborate analysis and better interpretability.Compared with flat clustering,the research progress of hierarchical clustering is slow.Aiming at the problem of hierarchical clustering,this paper surveys a large number of related papers in the aspects of the selection of cost functions,the evaluations of clustering results and the performance of clustering algorithms.The evaluation indices of clustering results mainly include modularity,Jaccard index,normalized mutual information,dendrogram purity,etc.Among the flat clustering algorithms,the classical algorithms include K-means algorithm,label propagation algorithm,DBSCAN algorithm,spectral algorithm and so on.The hierarchical clustering algorithms can be further classified into agglomerative clustering algorithms and split clustering algorithm.The split clustering algorithms involves dichotomy K-means algorithm,recursive sparsest cut algorithm,etc.Agglomerative clustering algorithms involves classical Louvain algorithm,BIRCH algorithm and more recent HLP algorithm,PERCH algorithm and GRINCH algorithm.This paper also further analyzes the advantages and disadvantages of these algorithms,and finally,the whole paper is summarized.

Keywords