SoftwareX (Jun 2022)

pyCLAMs: An integrated Python toolkit for classifiability analysis

  • Yinsheng Zhang,
  • Haiyan Wang,
  • Yongbo Cheng,
  • Xiaolin Qin

Journal volume & issue
Vol. 18
p. 101007

Abstract

Read online

In data-driven discriminative tasks, classifiability analysis is an often-neglected and implicit step. It answers the fundamental question: does the dataset possess sufficient between-class differences? To measure the dataset’s classifiability degree, we develop pyCLAMs (python package for CLassifiabilty Analysis Metrics). pyCLAMs has integrated existing classifiability complexity metrics (e.g., Fisher discriminant ratio, overlapping region volume, distribution topology) and extends more metrics/statistics, such as BER (Bayes error rate, irreducible error), ES (effect size), Person’s r, Spearman’s rho, Kendall’s tau, IG (information gain, mutual information), ANOVA (Analysis of Variance), MANOVA (Multivariate ANOVA), MWW (Mann–Whitney–Wilcoxon test), KS (Kolmogorov–Smirnov test), etc. The current version of pyCLAMs supports 68 metrics. We recommend researchers use pyCLAMs for a precursory assessment for their classification tasks.

Keywords