IEEE Access (Jan 2024)

KIHCDP: An Incremental Hierarchical Clustering Approach for IoT Data Using Dirichlet Process

  • Abishi Chowdhury,
  • Amrit Pal,
  • Ashwin Raut,
  • Manish Kumar

DOI
https://doi.org/10.1109/ACCESS.2024.3385628
Journal volume & issue
Vol. 12
pp. 56019 – 56032

Abstract

Read online

Internet of Things (IoT) devices are constantly producing vast amounts of data, necessitating efficient storage and processing to extract useful information. However, the models used to extract relevant information from IoT data are often hindered by the lack of useful data and the ever-changing distribution of this data. This paper introduces an incremental data clustering technique on a continuous stream of data through a Dirichlet process-based approach that is adept at handling the formation of clusters in streaming data. The complete approach is twofold; firstly, it starts with an estimated distribution of data and allocates an incoming data point to the estimated data distribution. Secondly, it refines the estimated data distribution after the allocation of the current point and over the subsequent arrival of data points. The influx of data leads to greater challenges in determining clusters for incoming points and preserving the current clusters for improved decision-making. In this context, our proposed approach deals with the increasing amount of data using a selective elimination technique on both existing and incoming data. To assess the performance of the proposed approach, benchmark experiments have been performed using benchmark datasets. The results of the experiments demonstrate that the proposed model has a gain ranging from 2% to 4% as compared to the existing state-of-the-art and recent adaptive clustering approaches in terms of clustering accuracy with incremental data addition and variable clustering parameters. The proposed method shows a high gain in terms of running time ranging from 2% to 20% as compared to the existing approaches depending on the data reduction parameter. Furthermore, research findings through this work indicate that it is possible to set a trade-off between accuracy and running time by adjusting the elimination parameter depending on the requirements of the considered application.

Keywords