Baghdad Science Journal (Sep 2024)
Enhancing Fuzzy C-Means Clustering with a Novel Standard Deviation Weighted Distance Measure
Abstract
The aim of this paper is to present a new approach to address the Fuzzy C Mean algorithm, which is considered one of the most important and famous algorithms that addressed the phenomenon of uncertainty in forming clusters according to the overlap ratios. One of the most important problems facing this algorithm is its reliance primarily on the Euclidean distance measure, and by nature, the situation is that this measure makes the formed clusters take a spherical shape, which is unable to contain complex or overlapping cases. Therefore, this paper attempts to propose a new measure of distance, where we were able to derive a formula for the variance of the fuzzy cluster to be entered as a weight on the Euclidean Distance (WED) formula. Moreover, the calculation was processed partitions matrix through the use of the K-Means algorithm and creating a hybrid environment between the fuzzy algorithm and the sharp algorithm. To verify what was presented, experimental simulation was used and then applied to reality using environmental data for the physical and chemical examination of water testing stations in Basra Governorate. It was proven through the experimental results that the proposed distance measure Weighted Euclidean distance had the advantage over improving the work of the HFCM algorithm through the criterion (Obj_Fun, Iteration, Min_optimization, good fit clustering and overlap) when (c = 2,3) and according to the simulation results, c = 2 was chosen to form groups for the real data, which contributed to determine the best objective function (23.93, 22.44, 18.83) at degrees of fuzzing (1.2, 2, 2.8), while according to the degree of fuzzing (m = 3.6), the objective function for Euclidean Distance (ED) was the lowest, but the criteria were (Iter. = 2, Min_optimization = 0 and ) which confirms that (WED) is the best.
Keywords