IEEE Access (Jan 2023)
Streaming Data Classification Based on Hierarchical Concept Drift and Online Ensemble
Abstract
In order to improve the performance of online learning in the real-time distribution of streaming data, a streaming data classification algorithm based on hierarchical concept drift and online ensemble(SCHCDOE) is proposed in this paper. The concept drift index is calculated based on the newly arrived data instance, and the streaming data is divided into three states: stable state, concept drift warning state, and concept drift occurrence state. When the streaming data is in a stable state, the classifier is not updated. When the streaming data is in a concept drift warning state, online ensemble learning is achieved through random subspaces method to perform feature selection and efficiently update the classifier. When the streaming data is in a concept drift occurrence state, anomaly detection mechanism is used to eliminate abnormal data, and online ensemble learning method and incremental learning method are combined for learning. Local information and global distribution information of the streaming data are fully utilized to train the model, so that the learning model can respond quickly after concept drift occurs. Experiments are conducted on both synthetic and real datasets, and the experimental results show that the proposed algorithm performs well. Compared with other classic algorithms, classification accuracy and concept drift adaptability of the proposed algorithm are improved.
Keywords