Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia

Maryamah Maryamah; Made Agus Putra Subali; Lailly Qolby; Agus Zainal Arifin; Ali Fauzi

doi:10.26418/jlk.v1i1.4

Jurnal Linguistik Komputasional (Mar 2018)

Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia

Maryamah Maryamah,
Made Agus Putra Subali,
Lailly Qolby,
Agus Zainal Arifin,
Ali Fauzi

Affiliations

Maryamah Maryamah
Made Agus Putra Subali
Lailly Qolby
Agus Zainal Arifin
Ali Fauzi

DOI: https://doi.org/10.26418/jlk.v1i1.4
Journal volume & issue: Vol. 1, no. 1
pp. 11 – 16

Abstract

Read online

Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.

Published in Jurnal Linguistik Komputasional

ISSN: 2621-9336 (Online)
Publisher: Indonesia Association of Computational Linguistics (INACL)
Country of publisher: Indonesia
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: http://inacl.id/journal/index.php/jlk

About the journal