Science Journal of University of Zakho (Feb 2023)

The Kurdish Language Corpus: State of the Art

  • Media Azzat,
  • Karwan Jacksi,
  • Ismael Ali

DOI
https://doi.org/10.25271/sjuoz.2023.11.1.1123
Journal volume & issue
Vol. 11, no. 1

Abstract

Read online

The notable growth of the digital communities and different online news streams led to the growing availability of online natural language content. However not all natural languages have the enough attention of being made readable and comprehendible to machines. Among these less resourced and paid attention languages is the Kurdish language. Creating the machine-readable text is the first step toward applications of text mining and semantic web, such as translation, information retrieval and recommendation systems. With the de facto challenges in the Kurdish language, such as the scarcity of linguistic sources and not having unified orthography rules, this language has a lack of the language processing tools. However, to overcome the mentioned challenges and enable intelligent applications the well organized and annotated Kurdish text corpora is needed. This review paper investigates the available textual corpora in the Kurdish language and its dialects and then determined challenges are discussed, open problems are listed and future directions suggested.

Keywords