IEEE Access (Jan 2023)

Developing an Open Domain Arabic Question Answering System Using a Deep Learning Technique

  • Yazeed Alkhurayyif,
  • Abdul Rahaman Wahab Sait

DOI
https://doi.org/10.1109/ACCESS.2023.3292190
Journal volume & issue
Vol. 11
pp. 69131 – 69143

Abstract

Read online

A question-answering system (QAS) retrieves a relevant response to user queries. The existing QASs are limited in performance in satisfying the users’ intention. In recent times, researchers have focused on developing Arabic QASs. However, a significant number of QASs are based on specific domains. Therefore, the study intends to develop an open-domain QAS using a deep learning technique. The proposed QAS comprises three phases: data pre-processing, name entity relationship, and response retrieval. The researchers apply de-diacritization, minimizing orthographic ambiguity, tokenization, and morphological analysis to extract the key terms from the Arabic content. This phase supports the QAS in overcoming the challenges of understanding the Arabic content. Multinomial Naïve Bayes algorithm is applied to uncover the relationship among the Arabic terms. In addition, the authors employ the Embeddings from Language Models approach with a quaternion long-short-term memory neural network (QLSTM) for constructing the QAS with limited resources. The Arabic reading comprehension dataset (ARCD) and TyDiQA are utilized to evaluate the performance. The experimental outcome reveals that the proposed QAS achieve accuracy, precision, recall, F1-score, MCC and Kappa of 96.23, 97, 96.95, 97, 95.98, 95.7 and 95.35, 94.8, 94.68, 94.73, 92.98, and 93.6 for ARCD and TyDiQA, respectively. The structure of the proposed QAS is lightweight and can be implemented in real-world applications.

Keywords