Term Weighting Schemes for Slovak Text Document Clustering

ZLACKÝ Daniel; STAŠ Ján; JUHÁR Jozef; CIŽMÁR Anton

Journal of Electrical and Electronics Engineering (May 2013)

Term Weighting Schemes for Slovak Text Document Clustering

ZLACKÝ Daniel,
STAŠ Ján,
JUHÁR Jozef,
CIŽMÁR Anton

Affiliations

ZLACKÝ Daniel
STAŠ Ján
JUHÁR Jozef
CIŽMÁR Anton

Journal volume & issue: Vol. 6, no. 1
pp. 163 – 166

Abstract

Read online

Text representation is the task of transforming the textual data into a multidimensional space with corresponding weights for every word. Wehave tested several widely used term weighting methods on manually created database from Slovak Wikipedia articles. The created vector space models were used as an input in unsupervised clustering algorithms, which cluster text documents based on these created models. We have tested nine different weighting schemes withK-mean clustering algorithm. The best results were obtained by TF-RIDF weighting scheme. However, the next experiments with different clustering techniques have not confirmed previous results.

Published in Journal of Electrical and Electronics Engineering

ISSN: 1844-6035 (Print); 2067-2128 (Online)
Publisher: Editura Universităţii din Oradea
Country of publisher: Romania
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://electroinf.uoradea.ro/index.php/jeee.html

About the journal

Abstract

Keywords