IEEE Access (Jan 2022)

A Novel Method of Clustering Using a Stochastic Approach

  • Gabiriele Bulivou,
  • Karuna G. Reddy,
  • M. G. M. Khan

DOI
https://doi.org/10.1109/ACCESS.2022.3219457
Journal volume & issue
Vol. 10
pp. 117925 – 117943

Abstract

Read online

The cluster analysis of real-life data often encounters the challenges of noisy data or may rely heavily on the uncertainty of the main clustering variable owing to its stochastic nature, which has a potential influence on its performance. In this paper, we propose a novel clustering technique that is efficient in dealing with noise and uncertainty in a dataset by adopting a stochastic approach that uses realistic values of data points by assuming a continuous probability distribution instead of exact values. By estimating the best-fit probability distribution of the clustering variable, the proposed method formulates the problem of determining the most homogeneous clusters by determining the optimum cluster partitions (OCP) as a mathematical programming problem (MPP). A computer-intensive dynamic programming technique was used to solve the MPP and determine the OCP, which minimized the sum of the weighted intracluster standard deviations. The proposed technique is then demonstrated in this study using univariate data that follows a normal distribution, which is a symmetric distribution, as well as the Weibull distribution, which is a skewed distribution. Numerical examples were also presented to illustrate the computational details of the proposed method. Finally, using both simulated and real datasets, a comparative analysis of the effectiveness of the proposed technique was performed against four advanced clustering methods: k-means, fuzzy c-means, expectation maximization, and Genie++ hierarchical clustering. The results reveal that the proposed method works well and produces more efficient clusters than other methods.

Keywords