Journal of King Saud University: Computer and Information Sciences (Jul 2021)
Initial seed selection for K-modes clustering – A distance and density based approach
Abstract
Initial seed artefacts play a vital role in proper categorization of the given data set in partitioning based clustering algorithms. Hence, it is important to identify them. We propose a density with distance based method which ensures identification of seed artefacts from different clusters that leads to more accurate clustering results. Our algorithm improves on the search for initial seed artefacts iteratively until the minimum value of the sum of within sum errors, normalized by their data sizes, is ensured. This is because the initial artefacts are selected from different clusters. Here the choice of seed artefacts guarantees a global optimum clustering solution. We have compared our results with random, Wu, Cao and Khan’s methods of initial seed artefact selection, to show the efficacy of our method.