Journal of King Saud University: Computer and Information Sciences (Feb 2020)

Feature selection using an improved Chi-square for Arabic text classification

  • Said Bahassine,
  • Abdellah Madani,
  • Mohammed Al-Sarem,
  • Mohamed Kissi

Journal volume & issue
Vol. 32, no. 2
pp. 225 – 231

Abstract

Read online

In text mining, feature selection (FS) is a common method for reducing the huge number of the space features and improving the accuracy of classification. In this paper, we propose an improved method for Arabic text classification that employs the Chi-square feature selection (referred to, hereafter, as ImpCHI) to enhance the classification performance. Besides, we have also compared this improved chi-square with three traditional features selection metrics namely mutual information, information gain and Chi-square.Building on our previous work, we extend the current work to assess the method in terms of other evaluation methods using SVM classifier. For this purpose, a dataset of 5070 Arabic documents are classified into six independently classes. In terms of performance, the experimental findings show that combining ImpCHI method and SVM classifier outperforms other combinations in terms of precision, recall and f-measures. This combination significantly improves the performance of Arabic text classification model. The best f-measures obtained for this model is 90.50%, when the number of features is 900. Keywords: Feature selection, Chi-square, Arabic text classification, Light stemming, Mutual information, Information gain, SVM, Decision tree