Journal of the National Council of Less Commonly Taught Languages (Aug 2009)

ARIDA: An Arabic Inter-Language Database and Its Applications: A Pilot Study

  • Ghazi Abuhakema,
  • Anna Feldman,
  • Eileen Fitzpatrick

Journal volume & issue
Vol. 7
pp. 145 – 172

Abstract

Read online

This paper describes a pilot study in which we collected a small learner corpus of Arabic, developed a tagset for error annotation and performed simple Computer-aided Error Analysis (CEA) on the data. For this study, we adapted the French Interlanguage Database (FRIDA) (Granger, 2003a) tagset to the data. We chose FRIDA in order to keep our tagging in line with a known standard. The paper describes the need for learner corpora, the learner data we have collected, the tagset we have developed, its advantages and disadvantages, the preliminary CEA results, other potential applications of the error-annotated corpus of Arabic, and the error frequency distribution of both proficiency levels as well as our ongoing work.

Keywords