Mathematics (Feb 2024)

Reliability of Partitioning Metric Space Data

  • Yariv N. Marmor,
  • Emil Bashkansky

DOI
https://doi.org/10.3390/math12040603
Journal volume & issue
Vol. 12, no. 4
p. 603

Abstract

Read online

The process of sorting or categorizing objects or information about these objects into clusters according to certain criteria is a fundamental procedure in data analysis. Where it is feasible to determine the distance metric for any pair of objects, the significance and reliability of the separation can be evaluated by calculating the separation/segregation power (SP) index proposed herein. The latter index is the ratio of the average inter distance to the average intra distance, independent of the scale parameter. Here, the calculated SP value is compared to its statistical distribution obtained by a simulation study for a given partition under the homogeneity null hypothesis to draw a conclusion using standard statistical procedures. The proposed concept is illustrated using three examples representing different types of objects under study. Some general considerations are given regarding the nature of the SP distribution under the null hypothesis and its dependence on the number of divisions and the amount of data within them. A detailed modus operandi (working method) for analyzing a metric data partition is also offered.

Keywords