GENERATION OF A SET OF KEY TERMS CHARACTERISING TEXT DOCUMENTS

Kristina Machova; Andrea Szaboova; Peter Bednar

Journal of Information and Organizational Sciences (Jun 2007)

GENERATION OF A SET OF KEY TERMS CHARACTERISING TEXT DOCUMENTS

Kristina Machova,
Andrea Szaboova,
Peter Bednar

Affiliations

Kristina Machova
Andrea Szaboova
Peter Bednar

Journal volume & issue: Vol. 31, no. 1

Abstract

Read online

The presented paper describes statistical methods (information gain, mutual X^2 statistics, and TF-IDF method) for key words generation from a text document collection. These key words should characterise the content of text documents and can be used to retrieve relevant documents from a document collection. Term relations were detected on the base of conditional probability of term occurrences. The focus is on the detection of those words, which occur together very often. Thus, key words, which consist from two terms were generated additionally. Several tests were carried out using the 20 News Groups collection of text documents.

Published in Journal of Information and Organizational Sciences

ISSN: 1846-3312 (Print); 1846-9418 (Online)
Publisher: University of Zagreb, Faculty of organization and informatics
Country of publisher: Croatia
LCC subjects: Science: Science (General): Cybernetics: Information theory
Website: http://jios.foi.hr/index.php/jios/index

About the journal

Abstract

Keywords