Electronic Research Archive (Apr 2024)

CCkEL: Compensation-based correlated k-labelsets for classifying imbalanced multi-label data

  • Qianpeng Xiao ,
  • Changbin Shao ,
  • Sen Xu,
  • Xibei Yang,
  • Hualong Yu

DOI
https://doi.org/10.3934/era.2024139
Journal volume & issue
Vol. 32, no. 5
pp. 3038 – 3058

Abstract

Read online

Imbalanced data distribution and label correlation are two intrinsic characteristics of multi-label data. This occurs because in this type of data, instances associated with certain labels may be sparse, and some labels may be associated with others, posing a challenge for traditional machine learning techniques. To simultaneously adapt imbalanced data distribution and label correlation, this study proposed a novel algorithm called compensation-based correlated k-labelsets (CCkEL). First, for each label, the CCkEL selects the k-1 strongest correlated labels in the label space to constitute multiple correlated k-labelsets; this improves its efficiency in comparison with the random k-labelsets (RAkEL) algorithm. Then, the CCkEL transforms each k-labelset into a multiclass issue. Finally, it uses a fast decision output compensation strategy to address class imbalance in the decoded multi-label decision space. We compared the performance of the proposed CCkEL algorithm with that of multiple popular multi-label imbalance learning algorithms on 10 benchmark multi-label datasets, and the results show its effectiveness and superiority.

Keywords