Sanadset 650K: Data on Hadith narrators

Mohammed Mghari; Omar Bouras; Abdelaaziz El Hibaoui

Data in Brief (Oct 2022)

Sanadset 650K: Data on Hadith narrators

Mohammed Mghari,
Omar Bouras,
Abdelaaziz El Hibaoui

Affiliations

Mohammed Mghari: Corresponding author.; Abdelmalek Essaâdi University, Faculty of Science, Computer Science Department, P.O. Box. 2121 M'Hannech II, Tetuan, 93030, Morocco
Omar Bouras: Abdelmalek Essaâdi University, Faculty of Science, Computer Science Department, P.O. Box. 2121 M'Hannech II, Tetuan, 93030, Morocco
Abdelaaziz El Hibaoui: Abdelmalek Essaâdi University, Faculty of Science, Computer Science Department, P.O. Box. 2121 M'Hannech II, Tetuan, 93030, Morocco

Journal volume & issue: Vol. 44
p. 108540

Abstract

Read online

The chain of narrators (Sanad) plays a vital role in deciding the authenticity of Islamic hadiths. However, the investigation and validation of such Sanad fully depend on scientists (Hadith Scholars). They ordinarily utilize their acquired knowledge, which in this manner needs a critical sum of exertion and time.Automated Sanad evaluation using machine learning algorithms is the best way to solve this problem. Therefore, a representative Sanad dataset is required.This paper presents a full hadith dataset which is named Sanadset and is made openly accessible for researchers. Sanadset corpus contains over 650,986 records collected from 926 historical Arabic books of hadith. This dataset can be used for further investigation and classification of hadiths (Strong/Weak), and narrators (trustworthy/not) using AI techniques, and also it can be used as a linguistic resource tool for Arabic Natural Language Processing.Our dataset is collected from online Hadith sources using data scraping and web crawling. The main contribution of this dataset is the extraction of narrator chains that were originally present in textual form within Hadith books. Each observation in the dataset contains complete information about a specific hadith, such as (original book, number, Hadith text, Matn, list of narrators, and the number of narrators).

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords