IEEE Access (Jan 2022)

A Comprehensive Review of Arabic Text Summarization

  • Asmaa Elsaid,
  • Ammar Mohammed,
  • Lamiaa Fattouh Ibrahim,
  • Mohammed M. Sakre

DOI
https://doi.org/10.1109/ACCESS.2022.3163292
Journal volume & issue
Vol. 10
pp. 38012 – 38030

Abstract

Read online

The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language’s morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.

Keywords