MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS

Boshra Taha Sadder; Rahma Taha Sadder; Gheith Abandah; Iyad Jafar

doi:10.5455/jjcit.71-1694435791

Jordanian Journal of Computers and Information Technology (Mar 2024)

MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS

Boshra Taha Sadder,
Rahma Taha Sadder,
Gheith Abandah,
Iyad Jafar

Affiliations

Boshra Taha Sadder: The University of Jordan
Rahma Taha Sadder: The University of Jordan
Gheith Abandah: The University of Jordan
Iyad Jafar: The University of Jordan

DOI: https://doi.org/10.5455/jjcit.71-1694435791
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 16

Abstract

Read online

Chatbots have recently become essential in various fields, ranging from customer service and information acquisition to entertainment. The use of chatbots reduces operational costs and human errors while providing services at any time. This work presents a Named Entity Recognition (NER) model for the Arabic booking chatbot, focusing on booking tickets and appointments across multiple domains. This research paves the way for the development of chatbots that can support multiple booking domains, contributing to the advancement of the Arabic language in this field. We adopt deep machine learning and transfer learning approaches to solve this task. Specifically, we utilized and fine-tuned the AraBERTv0.2 base model to develop the Named Entity Recognition for Booking Queries (NERB) model. Furthermore, we extended it to the Domain-Aware Named Entity Recognition for Booking Queries (DA-NERB) model by adding an additional input for domain type and an embedding layer. The input to our proposed model consists of text sequences of reservation requests, while the output includes sequences of tags representing entities within the input sequences. For training and testing, we synthesized the Arabic Booking Chatbot-Synthetic Dataset (ABC-S Dataset), comprising 76,117 reservation samples that span seven different domains and encompassing 26 categories of named entities. Additionally, we collected the Arabic Booking Chatbot-Collected Dataset (ABC-C Dataset) from volunteers to evaluate our model using various samples. It's worth noting that these datasets are written in informal Arabic, specifically the Levantine dialect. The proposed model achieves 100% and 96.9% accuracy scores on ABC-S (test set) and ABC-C, respectively. Both the datasets and the code for our model are publicly available to support research in the field of Arabic chatbots. [JJCIT 2024; 10(1.000): 1-16]

Published in Jordanian Journal of Computers and Information Technology

ISSN: 2413-9351 (Print); 2415-1076 (Online)
Publisher: Scientific Research Support Fund of Jordan (SRSF) and Princess Sumaya University for Technology (PSUT)
Country of publisher: Jordan
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jjcit.org/

About the journal

Abstract

Keywords