IEEE Access (Jan 2024)

Toward Automated Arabic Synonyms Extraction Using Arabic Lexical Substitution

  • Eman Naser-Karajah,
  • Nabil Arman

DOI
https://doi.org/10.1109/ACCESS.2024.3485502
Journal volume & issue
Vol. 12
pp. 174455 – 174463

Abstract

Read online

Lexical Substitution (LS) replaces the target word or phrase with its synonym alternatives that are equivalent in meaning. Despite the richness of the Arabic language, Arabic LS received little attention as there are no benchmark evaluation datasets, even though researchers in many languages showed interest in this task. This paper presents an LS pipeline, AraLexSubPro, which provides different techniques for generating, selecting, and ranking substitution. To make a thorough comparison, AraLexSubPro uses four different methods as baselines to generate substitution candidates for the target words: a synonym dictionary approach (AWN), a pre-trained language model approach (AraBERT), AraBERT dropout approach (partial masking), and a hybrid approach using AraBERT and AWN. The generated substitutions are filtered and then ranked based on three high-quality features to compare thoroughly: word similarity, word frequency, and the BERT score. The substitutions are then reranked based on our AraLexSubPro ranker. The AraLexSubPro pipeline was evaluated using the first Arabic LS benchmark, the AraLexSubD dataset. This paper presents the first comprehensive study on the Arabic LS task.

Keywords