Arabic Aspect Extraction Based on Stacked Contextualized Embedding With Deep Learning

Arwa Saif Fadel; Mostafa Elsayed Saleh; Osama Ahmed Abulnaja

doi:10.1109/ACCESS.2022.3159252

IEEE Access (Jan 2022)

Arabic Aspect Extraction Based on Stacked Contextualized Embedding With Deep Learning

Arwa Saif Fadel,
Mostafa Elsayed Saleh,
Osama Ahmed Abulnaja

Affiliations

Arwa Saif Fadel: ORCiD; Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah, Saudi Arabia
Mostafa Elsayed Saleh: Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah, Saudi Arabia
Osama Ahmed Abulnaja: ORCiD; Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2022.3159252
Journal volume & issue: Vol. 10
pp. 30526 – 30535

Abstract

Read online

The exponential growth of the internet and a multi-fold increase in social media users in the last decade have resulted in a massive growth of unstructured data. Aspect-Based Sentiment Analysis (ABSA) is challenging because it performs a fine-grain analysis; it is a text analysis technique where the opinions group is based on the aspect. The Aspect Extraction (AE) task is one of the core subtasks of ABSA; it helps to identify aspect terms in the text, comments, or reviews. The challenge of the Arabic AE task increases due to the complexity of the Arabic language. This work aims to develop the Arabic AE task by proposing transfer learning using state-of-art pre-trained contextual language models. We concatenate the Bidirectional Encoder Representation from Transformers (BERT) language model and contextualize string embeddings (Flair embedding) as a stacked embeddings layer for better word representation for Arabic language. Then, we extend it with different deep learning network architectures. For Arabic AE, the model is developed by concatenating the Arabic contextual language model, AraBERT, and Flair embedding as a contextual stacked embeddings layer with an extended layer, BiLSTM-CRF or BiGRU-CRF, for sequence labeling. Our proposed models are called BF-BiLSTM-CRF and BF-BiGRU-CRF. The proposed model is evaluated using the Arabic Hotel’s reviews dataset. For performance evaluation, we used the F1 score. The experimental results show that the proposed BF-BiLSTM-CRF configuration outperformed the baseline and other models by achieving an F1score of 79.7%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords