IEEE Access (Jan 2024)
Arabic Paraphrase Generation Using Transformer-Based Approaches
Abstract
Paraphrasing, a ubiquitous linguistic practice involving the rephrasing of sentences while preserving their underlying meaning, holds substantial significance across various Natural Language Processing (NLP) applications. This research focuses on the domain of Arabic Paraphrase Generation, aiming to introduce an innovative model capable of generating diverse Arabic paraphrases through experimentation with deep learning model. The proposed model extends beyond conventional baseline approaches, incorporating Transformer-based architectures and ChatGPT models to enhance the richness and variety of generated paraphrases. One notable challenge addressed in this study is the absence of an Arabic parallel paraphrased dataset. Recognizing this gap in existing resources, we propose the creation of an expanded paraphrase corpus, leveraging synthetic artificial data to bolster the paraphrasing generation process. This strategic augmentation aims to not only fill a critical void in the available datasets but also to provide a robust foundation for training and evaluating the performance of the paraphrase generation model. In the experimental phase, various models, including the baseline architecture, and Transformer-based models, are examined to assess their effectiveness in generating meaningful Arabic paraphrases. The results of automatic evaluation reveal that our Fine-tuned GPT-3.5 model surpasses state-of-the-art methods, achieving remarkable scores of 23.69%, 88.30%, and 91.89% in BLEU, BERTScore, and COMET evaluations, respectively. Additionally, the Fine-tuning AraT5v1 model shows around a 2.4% improvement in the BLEU score. Moreover, for human evaluation, Cohen kappa achieved 0.9. These findings highlight the potential of Transformer-based approaches in advancing Arabic Paraphrase Generation and affirm the effectiveness of our proposed model in elevating the quality and diversity of generated paraphrases.
Keywords