Journal of Harbin University of Science and Technology (Aug 2018)
AnImproved Naive Bayesian Text Classification Algorithm based on Weighted Features and its Complementary Set
Abstract
When training samples of each class are distributed unevenly and sparsely,the features of smaller class cannot be adequately expressed and submerged by lager class,to solve this problem,a new method TFWCNB ( TF-IDF weighted complementary Na-ve Bayes) algorithm was proposed for unbalanced problem. TFWCNB used weighted features to improve the complement na-ve Bayes and TF-IDF algorithm to calculate the feature word’s weight in the current document; in additional,it used features of current class’ s complementary set to represent the features of current class, combining the feature word’ s weight,it can solve the problem that the classifier tends to larger class and ignores the smaller class. The experimental results comparing with the traditional Nave Bayes and the complement Na-ve Bayes show that the TFWCNB algorithm has the best performance when the sample set is unevenly distributed,its classification precision,recall and g-mean value can relatively reach 82. 92% ,84. 6% and 88. 76% .
Keywords