IEEE Access (Jan 2019)

Classification of Hadith According to Its Content Based on Supervised Learning Algorithms

  • Hammam M. Abdelaal,
  • Berihan R. Elemary,
  • Hassan A. Youness

DOI
https://doi.org/10.1109/ACCESS.2019.2948159
Journal volume & issue
Vol. 7
pp. 152379 – 152387

Abstract

Read online

Given the importance of the Prophet’s Hadith for Muslims all over the world, where it is the second source of Islam after the Qur’an and the fundamental resource of legislation in the Islam community. This study is focused on the Classification of hadith automatically into different categories according to its content, based on Hadith text. The objective of this study is to build a classifier model can classify and differentiate hadith categories, to predict its topic like prayer, fasting, and zakat; using data mining and machine learning techniques. In this study, many supervised learning algorithms plus combination methods such as the stacking algorithm was used to improve classification accuracy. The best three classifiers were evaluated mainly: the Decision Tree (DT), Random Forest (RF), and Naïve Bayes (NB), which achieved higher accuracy reached up to 0.965%, 0.956, and 0.951% respectively. Also, Binary (Boolean algebra) and TF-IDF methods as term weighting was applied to determine the frequency of each word in the hadith text, and identify the most significant features in training dataset using Information Gain (IG), and Chi-square (CHI). The experimental results showed that re-train these classifiers after applying IG and CHI as features selection; gave better accuracy compared to the previous results. Additional to, the best classifier gave high accuracy was DT, it has achieved higher accuracy in most test cases whether in the Boolean algebra or TF-IDF because it can deal with missing values and identifying the most essential features from the training dataset, known as features engineering.

Keywords