IEEE Access (Jan 2024)
A Hybrid Attention-Based Transformer Model for Arabic News Classification Using Text Embedding and Deep Learning
Abstract
Efficient classification of Arabic news items has become more crucial for efficient information management and analysis due to the fast growth of online news material. This paper proposes a hybrid Attention-Based Transformer Model (ABTM) for Arabic news categorization that uses deep learning and classical text representations to improve classification accuracy and interpretability. Given the increasing amount of Arabic news materials, robust categorization systems are crucial for properly managing and analyzing this information. To deal with the complexities of the Arabic language and enrich the dataset, we used a thorough preparation pipeline that includes text cleaning, tokenization, lemmatization, and data augmentation approaches. We combined a bespoke attention embedder with classic TF-IDF and Bag-of-Words features to provide a comprehensive feature set that includes both the text’s contextual and statistical aspects. We benchmarked our technique using cutting-edge Arabic language models, such as AraBERTv1-base and asafaya/bert-base-arabic. We use (local interpretable model agnostic explanation) text explainer to offer insights into model predictions, improving our findings’ interpretability. Our results show that the ABTM strategy considerably enhances classification performance, with high accuracy and reasonable explanations for model decisions. This classification includes a wide range of news categories, including politics, sports, culture, the economy, and a variety of themes, representing the diversity of Arabic news. This study contributes to the field of Arabic natural language processing by offering a novel method that combines deep learning with traditional techniques, thereby advancing the state of Arabic news classification. Enhanced classification accuracy and interpretability facilitate better management and understanding of the rich and growing Arabic news content, supporting informed decision-making and knowledge discovery.
Keywords