IEEE Access (Jan 2021)
Ensemble Pruning of RF via Multi-Objective TLBO Algorithm and Its Parallelization on Spark
Abstract
Ensemble learning has been widely used in various fields. Still, too many base classifiers will affect the classification time of the ensemble classifier under the big data environment, while reducing base classifiers will affect the classification accuracy of the ensemble classifier. Therefore, the multi-objective teaching-learning-based optimization (MO-TLBO) algorithm is used to carry out ensemble pruning of random forest (RF) to improve the classification accuracy and speed of RF. MO-TLBO algorithm aims at maximizing classification accuracy and minimizing classification time, and it can find a sub-forest with higher classification accuracy and faster classification speed. In addition, considering the vast computational time of ensemble pruning of RF via MO-TLBO algorithm under the big data environment, a vote set is constructed to improve the fitness evaluation process. In the Spark platform, the RF improved by the MO-TLBO algorithm (MO-TLBO-RF) is parallelized based on data parallelism. The Shuffle optimization strategy is proposed to reduce the number of Shuffles in the execution of parallel MO-TLBO-RF. The proposed MO-TLBO-RF is applied to rolling bearing fault diagnosis. The experimental results show that the algorithm can obtain an RF with high fault diagnosis accuracy and fast fault diagnosis speed. The results also prove that the ensemble pruning time can be greatly reduced via the vote set and parallelization of MO-TLBO-RF.
Keywords