A Comprehensive Review of Arabic Text Summarization

Asmaa Elsaid; Ammar Mohammed; Lamiaa Fattouh Ibrahim; Mohammed M. Sakre

doi:10.1109/ACCESS.2022.3163292

IEEE Access (Jan 2022)

A Comprehensive Review of Arabic Text Summarization

Asmaa Elsaid,
Ammar Mohammed,
Lamiaa Fattouh Ibrahim,
Mohammed M. Sakre

Affiliations

Asmaa Elsaid: ORCiD; Department of Computer Science, Faculty of Graduate Studies of Statistical Researches, Cairo University, Giza, Egypt
Ammar Mohammed: ORCiD; Department of Computer Science, Faculty of Graduate Studies of Statistical Researches, Cairo University, Giza, Egypt
Lamiaa Fattouh Ibrahim: ORCiD; Department of Computer Science, Faculty of Graduate Studies of Statistical Researches, Cairo University, Giza, Egypt
Mohammed M. Sakre: Higher Institute of Computer Science and Information Technology, El Shorouk, Egypt

DOI: https://doi.org/10.1109/ACCESS.2022.3163292
Journal volume & issue: Vol. 10
pp. 38012 – 38030

Abstract

Read online

The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language’s morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords