IEEE Access (Jan 2020)

An Improved C4.5 Algorthm in Bagging Integration Model

  • Yu-Qing Song,
  • Xu Yao,
  • Zhe Liu,
  • Xianbao Shen,
  • Jingyi Mao

DOI
https://doi.org/10.1109/ACCESS.2020.3032291
Journal volume & issue
Vol. 8
pp. 206866 – 206875

Abstract

Read online

The C4.5 algorithm has three shortcomings: the wide range of candidate segmentation threshold sequences for continuous attributes, the comprehensive influence of different attributes and local subsets under the same attribute, and the inter-attribute redundancy. When dealing with continuous attributes, sampling and threshold supplement processing near the transition boundary of the attribute interval corresponding to the adjacent different categories are performed for narrowing the range of candate segmentation threshold sequences. By adding standardizing Euclidean distance of the attribute global and local factors to represent attribute weight, the calculation of C4.5 information gain is otpimized. And averaging Gini index of other attributes and adding correction factor, the influence of redundancy between attributes is greatly decreased. The overall average improvement range of the base classifier and the bagging integration classifier is 0.6%~2.1% and 0.7% ~ 2.7%, respectively, which shows that this integration model can improve the classification accuracy and also validate its feasibility and reliability.

Keywords