IEEE Access (Jan 2024)
Deep Learning in Written Arabic Linguistic Studies: A Comprehensive Survey
Abstract
This article presents a comprehensive survey on recent applications of deep learning (DL) algorithms to written Arabic. Despite the increasing amount of user-generated content in Arabic, linguistic studies focusing on Arabic suffer from low analytical resources. Considering the success of neural networks and DL in natural language processing tasks, the effectiveness of applying these methods to Arabic-based linguistic studies must be investigated. This study focuses on the application of DL to written Arabic and categorizes studies from a linguistic perspective by analyzing 111 studies published in the current decade (2020–2024) in the Web of Science ClarivateⓇdatabase, which were categorized under seven linguistic fields: forensic linguistics, educational linguistics, text linguistics, optical character recognition (OCR), artificial intelligence chatbots, poetry studies, and discourse analysis. DL has been predominantly applied for Arabic OCR and discourse analysis. DL-based analyses of text linguistic and forensic problems are thriving areas. However, the application of DL in educational linguistics and syntactic and morphological automatic analyzers in Arabic has received limited attention. Among the reviewed studies, the overall accuracy of DL models for various Arabic linguistic tasks is ~90.83%, indicating promising results. High performance was achieved in OCR (98.11%), text linguistics (93.57%), and forensic linguistics (92.10%). Promising accuracy rates are exhibited in discourse analysis (91.70%) and educational linguistics (90.19%), whereas further improvements are required in Arabic chatbots (83.29%) and poetry analysis (86.89%). Among the various DL architectures, the best results were obtained from models employing convolutional neural networks. Research gaps and directions of future research are presented.
Keywords