Feature-Based Complexity Measure for Multinomial Classification Datasets

Kyle Erwin; Andries Engelbrecht

doi:10.3390/e25071000

Entropy (Jun 2023)

Feature-Based Complexity Measure for Multinomial Classification Datasets

Kyle Erwin,
Andries Engelbrecht

Affiliations

Kyle Erwin: Computer Science Division, Stellenbosh University, Stellenbosch 7600, South Africa
Andries Engelbrecht: Computer Science Division, Stellenbosh University, Stellenbosch 7600, South Africa

DOI: https://doi.org/10.3390/e25071000
Journal volume & issue: Vol. 25, no. 7
p. 1000

Abstract

Read online

Machine learning algorithms are frequently used for classification problems on tabular datasets. In order to make informed decisions about model selection and design, it is crucial to gain meaningful insights into the complexity of these datasets. Feature-based complexity measures are a set of complexity measures that evaluates how useful features are at discriminating instances of different classes. This paper, however, shows that existing feature-based measures are inadequate in accurately measuring the complexity of various synthetic classification datasets, particularly those with multiple classes. This paper proposes a new feature-based complexity measure called the F5 measure, which evaluates the discriminative power of features for each class by identifying long sequences of uninterrupted instances of the same class. It is shown that the F5 measure better represents the feature complexity of a dataset.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords