Journal of King Saud University: Computer and Information Sciences (Apr 2017)

Towards a standard Part of Speech tagset for the Arabic language

  • Imad Zeroual,
  • Abdelhak Lakhouaja,
  • Rachid Belahbib

DOI
https://doi.org/10.1016/j.jksuci.2017.01.006
Journal volume & issue
Vol. 29, no. 2
pp. 171 – 178

Abstract

Read online

Part of Speech (PoS) tagging is still not very well investigated with respect to the Arabic language. Determining the PoS tags of a word in a particular context is difficult, primarily because there is no use of diacritics in most of contemporary texts. Consequently, the same word may be spelled in different ways. Further, detecting the difference between Arabic derivatives represents a very challenging issue for the majority of PoS taggers. Hence, the task of tagging the correct PoS tags requires advanced processing and the use of considerable resources. This study aims to design detailed hierarchical levels of the Arabic tagset categories and their relationships. These hierarchical levels allow easier expansion when required and produce more accurate and precise results. They are based on a comparative study and important references in Arabic grammar; they are also validated by experts in this field. In addition, the proposed tagset is implemented in a PoS tagger and tested via various experiments. We believe that our study makes a significant contribution to the literature because this work is an advancement in the direction of achieving a standard, rich, and comprehensive tagset for Arabic.

Keywords