IEEE Access (Jan 2024)

<italic>Tasneef</italic>: A Fast and Effective Hybrid Representation Approach for Arabic Text Classification

  • Maroua Louail,
  • Chafia Kara-Mohamed Hamdi-Cherif,
  • Aboubekeur Hamdi-Cherif

DOI
https://doi.org/10.1109/ACCESS.2024.3450507
Journal volume & issue
Vol. 12
pp. 120804 – 120826

Abstract

Read online

The Arabic language role in actual global affairs entails sophisticated natural language processing techniques, especially in text classification. This paper presents Tasneef as a novel hybrid approach to tackle computational challenges by reducing memory usage and runtime overhead for actual Arabic text classification (ATC). Tasneef integrates distance-based meta-features (DBMFs) representation with word embeddings. This integration is useful because using a single text representation technique can be limiting in capturing the essential range of features necessary for effective classification, especially in complex languages like Arabic. By addressing the intricacies arising from the high dimensionality and sparsity inherent in Term Frequency-Inverse Document Frequency (TF-IDF) representation, the utilization of DBMFs is shown to offer a promising solution. The DBMFs rely on document labels and statistical features to establish meaningful distance relationships between documents, thereby facilitating effective reduction. Furthermore, word embeddings encapsulate semantic attributes. Empirical assessments reveal a significant reduction of two orders of magnitude in both memory usage and runtime. This reduction translates to memory savings ranging from 158x to 361x and runtime reductions from 120x to 524x across three popular datasets; maintaining comparable MicroF1 and MacroF1 values, while notably reducing learning time. Moreover, Tasneef outperforms ten state-of-the-art deep learning models and seven dimension reduction methods in accuracy, with enhancements ranging from 0.3% to 39.6%; and F-Measure, with improvements from 4.6% to 26.8%, across four additional datasets. These findings highlight Tasneef as a promising solution for diverse ATC applications in real-world scenarios, offering concise and rapid classification with reduced computational learning costs.

Keywords