Journal of Electrical and Electronics Engineering (May 2013)
Term Weighting Schemes for Slovak Text Document Clustering
Abstract
Text representation is the task of transforming the textual data into a multidimensional space with corresponding weights for every word. Wehave tested several widely used term weighting methods on manually created database from Slovak Wikipedia articles. The created vector space models were used as an input in unsupervised clustering algorithms, which cluster text documents based on these created models. We have tested nine different weighting schemes withK-mean clustering algorithm. The best results were obtained by TF-RIDF weighting scheme. However, the next experiments with different clustering techniques have not confirmed previous results.