Linguistik Online (Jun 2016)

FemSMA Corpus Workbench. Ein Werkzeug zur Unterstützung der qualitativen und quantitativen Analyse von textuellen Daten

  • Brigitte Krenn

DOI
https://doi.org/10.13092/lo.76.2818
Journal volume & issue
Vol. 76, no. 2

Abstract

Read online

In various areas of (linguistic) research, there is a need to analyse larger amounts of textual data. Digitisation and the availability of computational linguistics tools offer substantial support in qualitatively and quantitatively analysing those data sets. Keeping, maintaining and presenting data and their metadata within one system facilitate data inspection and browsing. Quick assessment of data sets for the presence or absence of specific textual characteristics is supported by the possibility to manually annotate segments of text with theory-driven meta-information in combination with automatic analysis employing computational linguistics tools and computerized search. In the present contribution, the FemSMA Corpus Workbench CWB is introduced. CWB is a computational linguistics tool for manual and automatic annotation and analysis of text documents. CWB supports storage and maintenance of, and annotation and search in textual data and related metadata. CWB is a client-server application with a web interface as frontend for data inspection and manual annotation. Data storage and automatic processing is done at server side. Automatically annotated are word-level features such as parts of speech; general word features such as capitalisation, character reduplication, abbreviation; swear words and emotion words. Due to its modular system architecture, CWB can be flexibly extended, which, however, requires the involvement of computational linguists to adapt and extend CWB’s automatic analysis and search functionalities, and represent the new functionality in the web interface.