AN AUTOMATED NEW APPROACH IN FAST TEXT CLASSIFICATION: A CASE STUDY FOR KURDISH TEXT

Ari M. Saeed

doi:10.25271/sjuoz.2024.12.3.1296

Science Journal of University of Zakho (Jul 2024)

AN AUTOMATED NEW APPROACH IN FAST TEXT CLASSIFICATION: A CASE STUDY FOR KURDISH TEXT

Ari M. Saeed

Affiliations

Ari M. Saeed: Department of Computer Science, College of Science, University of Halabja, Halabja, Kurdistan Region, Iraq

DOI: https://doi.org/10.25271/sjuoz.2024.12.3.1296
Journal volume & issue: Vol. 12, no. 3

Abstract

Read online

With the rapid development of internet technology, text classification has become a vital part of obtaining quick and accurate data. Traditional machine learning methods often suffer from poor performance and high-dimensional feature spaces, which reduce their accuracy. In this paper, the FastText model is proposed as the first-ever classifier on Kurdish text and the results are compared with traditional machine learning methods to show the effects on Kurdish Text. For evaluating the model four datasets Kurdish News Dataset Headlines (KNDH), Medical Kurdish Dataset (MKD), Kurdish-Emotional-Dataset (KMD-77000), and KurdiSent are utilized and compared the results with the traditional machine learning algorithms such as: Random Forest (RF), k-nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Decision Tree (DT), Stochastic Gradient Descent (SGD), as well as the deep learning model Bidirectional Encoder Representations from Transformers (BERT). The outcomes indicate that the FastText model achieved the highest performance with 89% for each precision, recall, F1-score, and 89.10% accuracy for the KNDH dataset. Moreover, when the KMD dataset is utilized the FatText model obtained outperforms all others by approximately 2%. In addition, the comparative analysis showed that FastText is superior when Kurdisent is considered with precision, recall, F1-score, and accuracy by 81.32, 81.83, 81.57, and 81.4 respectively. In addition, when MKD is implemented, the FastText model obtained the highest performance with a precision of 93.32%, recall of 93.36, F1-score of 93.34, and accuracy of 93.1%.

Published in Science Journal of University of Zakho

ISSN: 2663-628X (Print); 2663-6298 (Online)
Publisher: University of Zakho
Country of publisher: Iraq
LCC subjects: Science
Website: https://sjuoz.uoz.edu.krd/

About the journal

Abstract

Keywords