Applied Computational Intelligence and Soft Computing (Jan 2016)
A Semisupervised Cascade Classification Algorithm
Abstract
Classification is one of the most important tasks of data mining techniques, which have been adopted by several modern applications. The shortage of enough labeled data in the majority of these applications has shifted the interest towards using semisupervised methods. Under such schemes, the use of collected unlabeled data combined with a clearly smaller set of labeled examples leads to similar or even better classification accuracy against supervised algorithms, which use labeled examples exclusively during the training phase. A novel approach for increasing semisupervised classification using Cascade Classifier technique is presented in this paper. The main characteristic of Cascade Classifier strategy is the use of a base classifier for increasing the feature space by adding either the predicted class or the probability class distribution of the initial data. The classifier of the second level is supplied with the new dataset and extracts the decision for each instance. In this work, a self-trained NB∇C4.5 classifier algorithm is presented, which combines the characteristics of Naive Bayes as a base classifier and the speed of C4.5 for final classification. We performed an in-depth comparison with other well-known semisupervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique has better accuracy in most cases.