IEEE Access (Jan 2023)

Robust Model Design by Comparative Evaluation of Clustering Algorithms

  • Xiaopeng Chen,
  • Chanseok Park,
  • Xuehong Gao,
  • Bosung Kim

DOI
https://doi.org/10.1109/ACCESS.2023.3306023
Journal volume & issue
Vol. 11
pp. 88135 – 88151

Abstract

Read online

The K-means algorithm, widely used in cluster analysis, is a centroid-based clustering method known for its high efficiency and scalability. However, in realistic situations, the operating environment is susceptible to contamination issues caused by outliers and distribution departures, which may lead to clustering results from K-means that are distorted or rendered invalid. In this paper, we introduce three other alternative algorithms, including K-weighted-medians, K-weighted-L2-medians, and K-weighted-HLs, to address these issues under the consideration of data with weights. The impact of contamination is investigated by examining the estimation effects on optimal cluster centroids. We explore the robustness of the clustering algorithms from the perspective of the breakdown point, and then conduct experiments on simulated and real datasets to evaluate their performance using two new numerical metrics: relative efficiencies based on generalized variance and average Euclidean distance. The results demonstrate the effectiveness of the proposed K-weighted-HLs algorithm, surpassing other algorithms in scenarios involving both contamination issues.

Keywords