Linguistik Online (Jan 2009)

Verteilte Korpusabfragesysteme

  • Roth, Tobias

Journal volume & issue
Vol. 38, no. 2
pp. 67 – 78

Abstract

Read online

Distributed text corpora have not been very much in use so far. The Swiss Text Corpus (CHTK) and its partner projects set up a distributed corpus for German ("Korpus C4"), virtually merging parts of their corpus data and making them available through one common query platform. Based on experience made during this project, we propose a possible path towards a more standardised interface for distributed corpus queries. This should allow to integrate new as well as existing corpora more easily into distributed corpus systems. Special attention is paid to problems such as responsibility assignment, performance, user management, format unification and metadata synchronisation.