Journal of Sensor and Actuator Networks (Jun 2024)
A Nature-Inspired Partial Distance-Based Clustering Algorithm
Abstract
In the rapidly advancing landscape of digital technologies, clustering plays a critical role in the domains of artificial intelligence and big data. Clustering is essential for extracting meaningful insights and patterns from large, intricate datasets. Despite the efficacy of traditional clustering techniques in handling diverse data types and sizes, they encounter challenges posed by the increasing volume and dimensionality of data, as well as the complex structures inherent in high-dimensional spaces. This research recognizes the constraints of conventional clustering methods, including sensitivity to initial centroids, dependence on prior knowledge of cluster counts, and scalability issues, particularly in large datasets and Internet of Things implementations. In response to these challenges, we propose a K-level clustering algorithm inspired by the collective behavior of fish locomotion. K-level introduces a novel clustering approach based on greedy merging driven by distances in stages. This iterative process efficiently establishes hierarchical structures without the need for exhaustive computations. K-level gives users enhanced control over computational complexity, enabling them to specify the number of clusters merged simultaneously. This flexibility ensures accurate and efficient hierarchical clustering across diverse data types, offering a scalable solution for processing extensive datasets within a reasonable timeframe. The internal validation metrics, including the Silhouette Score, Davies–Bouldin Index, and Calinski–Harabasz Index, are utilized to evaluate the K-level algorithm across various types of datasets. Additionally, comparisons are made with rivals in the literature, including UPGMA, CLINK, UPGMC, SLINK, and K-means. The experiments and analyses show that the proposed algorithm overcomes many of the limitations of existing clustering methods, presenting scalable and adaptable clustering in the dynamic landscape of evolving data challenges.
Keywords