An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory

Jinming Liu; Kai Huang; Chen Chen; Jian Mao

doi:10.1007/s40747-024-01498-w

Complex & Intelligent Systems (Jun 2024)

An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory

Jinming Liu,
Kai Huang,
Chen Chen,
Jian Mao

Affiliations

Jinming Liu: College of Computer Engineering, Jimei University
Kai Huang: College of Computer Engineering, Jimei University
Chen Chen: College of Electronic and Information Engineering, Tongji University
Jian Mao: College of Computer Engineering, Jimei University

DOI: https://doi.org/10.1007/s40747-024-01498-w
Journal volume & issue: Vol. 10, no. 5
pp. 6267 – 6282

Abstract

Read online

Abstract Imbalanced class distributions are common in real-world scenarios, including datasets with multiple labels. One widely acknowledged approach to addressing imbalanced distributions is through oversampling, a technique that both balances the class distribution and improves the effectiveness of classification models. However, when generating synthetic data for multi-label datasets, complexities arise due to the presence of multiple-label sets, which require careful placement and labeling. We propose MLCSMOTE-FRST, an algorithm for synthetic data generation based on label-specific clustering and fuzzy rough set theory. Generation ratios and dependency samples are provided by clusters specific to each label, with a focus on the overall label distribution and the distribution within each cluster. The labels are supported by intra-cluster positive samples, determined using fuzzy rough set theory, which helps to capture the consensus label set. Experimental results on multi-label datasets using four classifiers demonstrate the effectiveness of the proposed method in terms of macro-F1 and micro-F1 scores.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords