Linguistik Online (Jan 2009)
Schweizer Text Korpus – Theoretische Grundlagen, Korpusdesign und Abfragemöglichkeiten
Abstract
The SWISS TEXT CORPUS (CHTK) has made it its goal to extensively document the German language of the 20th century in Switzerland. In this way, and in its parallel function as a sub-corpus of the Corpus C4, that will consist of 20 million text words (tokens) each from Germany, Austria, Italy/South Tirol and, as already said, Switzerland, it represents a classical reference corpus both for the standard German language in Switzerland as well as in the entire German-speaking area of Western Europe. A reference corpus should meet the requirement of comprehensively depicting the central repertoire of a language, i.e. the generally used vocabulary of this language, which is why questions of corpus structure and general planning (corpus design) play a decisive role (cf. Lemnitzer/Zinsmeister (2006: 106), where the type of the reference corpus is contrasted with the special corpus). Four and a half years after the start of the project, the SWISS TEXT CORPUS was made available to the general public in April 2009, as a research instrument. The following article outlines in brief the history of this research project and deals with fundamental and specific decisions that had to be made in the design of such a reference corpus, and with how the CHTK is compiled. Together with a concluding overview of some retrieval and analysis options offered by the CHTK, this article also provides an overview of the potential of this new research instrument and supplies the background knowledge required to work with the CHTK. For reasons of space, the methods of working, the corpus-driven approaches, cannot be thematised here (cf. Bubenhofer 2008, 2006).