IEEE Access (Jan 2020)
A Novel Class-Center Vector Model for Text Classification Using Dependencies and a Semantic Dictionary
Abstract
Automatic text classification is a research focus and core technology in information retrieval and natural language processing. Different from the traditional text classification methods (SVM, Bayesian, KNN), the class-center vector method is an important text classification method, which has the advantages of less calculation and high efficiency. However, the traditional class-center vector method for text classification has the disadvantages that the class vector is large and sparse, and its classification accuracy is not high because of the lack of semantic information. To overcome these problems, this paper proposes a novel class-center vector model for text classification using dependencies and a semantic dictionary. We respectively use WordNet English semantic dictionary and Tongyici Cilin Chinese semantic dictionary to cluster the English or Chinese feature words in the class-center vector and to significantly reduce the dimension of class-center vector, thereby realizing a new class-center vector for text classification using dependencies and a semantic dictionary. Experiments show that, compared with traditional text classification algorithms, the improved class-center vector method has lower time complexity and higher accuracy on the 20Newsgroups English corpus, Fudan and Sogou Chinese corpus. This paper is an improved version of our NLPCC2019 conference paper.
Keywords