Text Classification Using Word-Based PPM Models

Victoria Bobicev

Computer Science Journal of Moldova (Sep 2006)

Text Classification Using Word-Based PPM Models

Victoria Bobicev

Affiliations

Victoria Bobicev: Technical University of Moldova

Journal volume & issue: Vol. 14, no. 2(41)
pp. 183 – 201

Abstract

Read online

Text classification is one of the most actual among the natural language processing problems. In this paper the application of word-based PPM (Prediction by Partial Matching) model for automatic content-based text classification is described. Our main idea is that words and especially word combinations are more relevant features for many text classification tasks. Key-words for a document in most cases are not just single words but combination of two or three words. The main result of the implemented experiments proved applicability of word-based PPM models for content-based text classification. Although in some cases the entropy difference which influenced the choice was rather small (several hundredths), most of the documents (up to 97 %) were classified correctly.

Published in Computer Science Journal of Moldova

ISSN: 1561-4042 (Print); 2587-4330 (Online)
Publisher: Vladimir Andrunachievici Institute of Mathematics and Computer Science
Country of publisher: Moldova, Republic of
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.math.md/en/publications/csjm/

About the journal