大数据 (Mar 2025)

Imbalanced data stream classification method with limited labels

  • LI Yanhong,
  • LI Zhihua,
  • ZHENG Jianxing,
  • BAI Hexiang,
  • GUO Xin

Journal volume & issue
Vol. 11
pp. 107 – 126

Abstract

Read online

Data stream classification is a crucial research area within data stream mining, with the core task of swiftly capturing concept drifts from real-time incoming data stream and promptly adjusting classification models. Extreme learning machine possesses advantages such as fast training speeds and excellent generalization performance. However, existing data stream classification methods based on extreme learning machine often struggle to simultaneously address common challenges in data stream, such as multi-class imbalance, concept drift, and the expensive labeling cost. For this reason, an imbalanced data stream classification with limited labels was proposed. We defined a sample prediction certainty measure that combined the difference in predicted probabilities and information entropy. An uncertainty label request strategy was introduced. Furthermore, we defined a sample importance measure based on class imbalance ratios and sample prediction errors. We also proposed an update and reconstruction mechanism for the classifier based on the concept drift index. Comparative experiments on six synthetic data streams and three real data streams demonstrate that the proposed method outperforms six existing data stream classification methods in terms of classification performance.

Keywords