IEEE Access (Jan 2024)

Arabic Narrative Question Answering (QA) Using Transformer Models

  • Mohammad A. Ateeq,
  • Sabrina Tiun,
  • Hamed Abdelhaq,
  • Nawras Rahhal

DOI
https://doi.org/10.1109/ACCESS.2023.3348410
Journal volume & issue
Vol. 12
pp. 2760 – 2777

Abstract

Read online

The Narrative question answering (QA) problem involves generating accurate, relevant, and human-like answers to questions based on the comprehension of a story consisting of logically connected paragraphs. Developing Narrative QA models allows students to ask about inconspicuous narrative elements while reading the story. However, this problem remains unexplored for the Arabic language because of the lack of Arabic narrative datasets. To address this gap, we present the Arabic-NarrativeQA dataset, which is the first dataset specifically designed for machine-reading comprehension of Arabic stories. This dataset consists of two parts: translation of an English NarrativeQA dataset and a collection of new question-answer pairs based on Arabic stories. Furthermore, we implement the Arabic-NarrativeQA system using the Ranker-Reader pipeline, exploring and evaluating various approaches at each stage to identify the most effective ones. To avoid the need for an extensive data collection process, we utilize cross-lingual transfer learning techniques to leverage knowledge transfer from the English Narrative QA dataset to the Arabic-NarrativeQA system. Experiments show that incorporating cross-lingual transfer learning significantly improved the performance of the reader models. Furthermore, the question’s evidence information provided in the Arabic-NarrativeQA dataset enables the learnable rankers to effectively identify and select the pertinent paragraphs. Finally, we examine and categorize challenging questions that require a deep understanding of the stories. By incorporating these question types into the introduced dataset, we show that existing reading comprehension models struggle to answer them, and further model development should be conducted. To promote further research on this task, we make both the Arabic-NarrativeQA dataset and the pre-trained models publicly available.

Keywords