HighTech and Innovation Journal (Dec 2024)
Contextual Semantic Embeddings Based on Transformer Models for Arabic Biomedical Questions Classification
Abstract
Arabic biomedical question classification (ABQC) is a challenging task due to various reasons including, the specialized jargon expressed in Arabic language, complex semantics of Arabic vocabulary and the lack of specific datasets and corpora. When representing questions, only a few studies deal with ABQC by taking into account the word context. In this work, we propose a classification model designed for Arabic biomedical questions. We build vector representations capturing the contextual and semantic information of Arabic biomedical text, which presents numerous challenges, such as the derivational morphology of Arabic language, the specialized terminology of biomedical terms and the lack of capitalization in text. Our representation adapts the extensive knowledge encoded in BERT (Bidirectional Encoder Representations from Transformers) and other transformer models, to address the aforementioned challenges. Several experiments have been conducted on a dedicated Arabic biomedical dataset namely: MAQA, with well-known transformer models including BERT, AraBERT, BioBERT, RoBERTa, and DistilBERT fine-tuned for the classification task. Obtained results show that our method achieves remarkable performance with an accuracy of 93.31% and an F1-score of 93.35%. Doi: 10.28991/HIJ-2024-05-04-011 Full Text: PDF
Keywords