International Journal of Data and Network Science (Jan 2024)

An improved multi-stage framework for large-scale hierarchical text classification problems using a modified feature hashing and bi-filtering strategy

  • Abubakar Ado,
  • Abdulkadir Abubakar Bichi,
  • Usman Haruna,
  • Mohammed Almaiah,
  • Yahaya Garba Shawai,
  • Rommel AlAli,
  • Tayseer Alkhdour,
  • Theyazn H.H Aldhyani,
  • Mahmoad Al-rawad,
  • Rami Shehab

DOI
https://doi.org/10.5267/j.ijdns.2024.6.012
Journal volume & issue
Vol. 8, no. 4
pp. 2193 – 2204

Abstract

Read online

The classification of large-scale textual dataset is associated with a huge number of instances and millions of features which must be discriminated between large numbers of categories. The task requires the utilization of a defined hierarchy structure and tools that automatically classify instances within the hierarchy known as Large Scale Hierarchical Text Classification (LSHTC). Predicting the labels of instances by the employed classifiers is challenging due to the high number of features. Furthermore, the existing Dimensional Reduction (DR) approaches in cooperation with the LSHTC framework are still quite inefficient. In such a problem, an effective Hierarchical Dimensional Reduction approach can be advantageous in improving the performance of the LSHTC. Therefore, in this paper, we enhance the performance of LSHTC by proposing a Multi-stage Hierarchical Dimensional Reduction (MHDR) approach based on Modified Feature Hashing (MFH) and Hierarchical Bi-Filtering (HBF) method. In addition to alleviating bad collision and result discrepancy, experimental results show that the proposed approach has achieve the best performance in terms of micro-f1 and macro-f1 by recording average scores of 58.47% and 54.77% using TD-SVM, and average scores of 51.14% and 48.70% using TD-LR, respectively. The method also achieved 11% speed-up than the approaches compared.