IEEE Access (Jan 2021)

A Clustering Analysis Method With High Reliability Based on Wilcoxon-Mann-Whitney Testing

  • Yuan Cheng,
  • Weinan Jia,
  • Ronghua Chi,
  • Ao Li

DOI
https://doi.org/10.1109/ACCESS.2021.3053244
Journal volume & issue
Vol. 9
pp. 19776 – 19787

Abstract

Read online

As a core step in clustering analysis, distance measurement results can influence clustering accuracy. Existing measurement methods are mostly based on cluster feature information. However, these cluster features may be insufficient and result in losing data information for clusters containing a number of objects. To improve measurement accuracy, we make full use of the distribution characteristics of objects in clusters, i.e., we use descriptive statistics and the Wilcoxon-Mann-Whitney rank sum test in nonparametric statistics to measure distances during clustering. Furthermore, we propose a two-stage clustering algorithm to improve clustering analysis performance. In terms of avoiding preliminarily assuming the number of clusters, with the proposed distance measurement method, the clustering algorithm can discover clusters with arbitrary shapes and improve clustering accuracy. Experiments with multiple datasets compared with other clustering algorithms illustrate the accuracy and efficiency of the proposed clustering algorithm.

Keywords