Rekayasa (Aug 2020)

Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis

  • Ana Tsalitsatun Ni'mah,
  • Agus Zainal Arifin

DOI
https://doi.org/10.21107/rekayasa.v13i2.6412
Journal volume & issue
Vol. 13, no. 2
pp. 172 – 180

Abstract

Read online

Hadis adalah sumber rujukan agama Islam kedua setelah Al-Qur’an. Teks Hadis saat ini diteliti dalam bidang teknologi untuk dapat ditangkap nilai-nilai yang terkandung di dalamnya secara pegetahuan teknologi. Dengan adanya penelitian terhadap Kitab Hadis, pengambilan informasi dari Hadis tentunya membutuhkan representasi teks ke dalam vektor untuk mengoptimalkan klasifikasi otomatis. Klasifikasi Hadis diperlukan untuk dapat mengelompokkan isi Hadis menjadi beberapa kategori. Ada beberapa kategori dalam Kitab Hadis tertentu yang sama dengan Kitab Hadis lainnya. Ini menunjukkan bahwa ada beberapa dokumen Kitab Hadis tertentu yang memiliki topik yang sama dengan Kitab Hadis lain. Oleh karena itu, diperlukan metode term weighting yang dapat memilih kata mana yang harus memiliki bobot tinggi atau rendah dalam ruang Kitab Hadis untuk optimalisasi hasil klasifikasi dalam Kitab-kitab Hadis. Penelitian ini mengusulkan sebuah perbandingan beberapa metode term weighting, yaitu: Term Frequency Inverse Document Frequency (TF-IDF), Term Frequency Inverse Document Frequency Inverse Class Frequency (TF-IDF-ICF), Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency (TF-IDF-ICSδF), dan Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency Inverse Hadith Space Density Frequency (TF-IDF-ICSδF-IHSδF). Penelitian ini melakukan perbandingan hasil term weighting terhadap dataset Terjemahan 9 Kitab Hadis yang diterapkan pada mesin klasifikasi Naive Bayes dan SVM. 9 Kitab Hadis yang digunakan, yaitu: Sahih Bukhari, Sahih Muslim, Abu Dawud, at-Turmudzi, an-Nasa'i, Ibnu Majah, Ahmad, Malik, dan Darimi. Hasil uji coba menunjukkan bahwa hasil klasifikasi menggunakan metode term weighting TF-IDF-ICSδF-IHSδF mengungguli term weighting lainnya, yaitu mendapatkan Precission sebesar 90%, Recall sebesar 93%, F1-Score sebesar 92%, dan Accuracy sebesar 83%. Comparison of a term weighting method for the text classification in Indonesian hadith Hadith is the second source of reference for Islam after the Qur’an. Currently, hadith text is researched in the field of technology for capturing the values of technology knowledge. With the research of the Book of Hadith, retrieval of information from the hadith certainly requires the representation of text into vectors to optimize automatic classification. The classification of the hadith is needed to be able to group the contents of the hadith into several categories. There are several categories in certain Hadiths that are the same as other Hadiths. Shows that there are certain documents of the hadith that have the same topic as other Hadiths. Therefore, a term weighting method is needed that can choose which words should have high or low weights in the Hadith Book space to optimize the classification results in the Hadith Books. This study proposes a comparison of several term weighting methods, namely: Term Frequency Inverse Document Frequency (TF-IDF), Term Frequency Inverse Document Frequency Inverse Class Frequency (TF-IDF-ICF), Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency (TF-IDF-ICSδF) and Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency Inverse Hadith Space Density Frequency (TF-IDF-ICSδF-IHSδF). This research compares the term weighting results to the 9 Hadith Book Translation dataset applied to the Naive Bayes classification engine and SVM. 9 Books of Hadith are used, namely: Sahih Bukhari, Sahih Muslim, Abu Dawud, at-Turmudzi, an-Nasa’i, Ibn Majah, Ahmad, Malik, and Darimi. The trial results show that the classification results using the TF-IDF-ICSδF-IHSδF term weighting method outperformed another term weighting, namely getting a Precession of 90%, Recall of 93%, F1-Score of 92%, and Accuracy of 83%.

Keywords