IEEE Access (Jan 2024)

The Outcomes and Publication Standards of Research Descriptions in Document Classification: A Systematic Review

  • Marcin Marcin Michal Mironczuk,
  • Adam Muller,
  • Witold Pedrycz

DOI
https://doi.org/10.1109/ACCESS.2024.3513550
Journal volume & issue
Vol. 12
pp. 189253 – 189287

Abstract

Read online

Document classification, a critical area of research, employs machine and deep learning methods to solve real-world problems. This study attempts to highlight the qualitative and quantitative outcomes of the literature review from a broad range of scopes, including machine and deep learning methods, as well as solutions based on nature, biological, or quantum physics-inspired methods. A rigorous synthesis was conducted using a systematic literature review of 102 papers published between 2003 and 2023. The 20 Newsgroups (bydate version) were used as a reference point of benchmarks to ensure fair comparisons of methods. Qualitative analysis revealed that recent studies utilize Graph Neural Networks (GNNs) combined with models based on the transformer architecture and propose end-to-end solutions. Quantitative analysis demonstrated state-of-the-art results, with accuracy, micro and macro F1-scores of 90.38%, 88.28%, and 89.38%, respectively. However, the reproducibility of many studies may need to be revised for the scientific community. The resulting overview covers a wide range of document classification methods and can contribute to a better understanding of this field. Additionally, the systematic review approach reduces systematic error, making it useful for researchers in the document classification community.

Keywords