Taming our Wild Data

Renske van Enschot; Wilbert Spooren; Antal van den Bosch; Christian Burgers; Liesbeth Degand; Jacqueline Evers-Vermeul; Florian Kunneman; Christine Liebrecht; Yvette Linders; Alfons Maes

doi:10.51751/dujal16248

Dutch Journal of Applied Linguistics (Mar 2024)

Taming our Wild Data

Renske van Enschot,
Wilbert Spooren,
Antal van den Bosch,
Christian Burgers,
Liesbeth Degand,
Jacqueline Evers-Vermeul,
Florian Kunneman,
Christine Liebrecht,
Yvette Linders,
Alfons Maes

Affiliations

Renske van Enschot: Tilburg University, Department of Communication and Cognition
Wilbert Spooren: Centre for Language Studies, Radboud University
Antal van den Bosch: Institute for Language Sciences, Utrecht University
Christian Burgers: Amsterdam School of Communication Research (ASCoR), University of Amsterdam
Liesbeth Degand: Institute for Language and Communication, University of Louvain
Jacqueline Evers-Vermeul: Institute for Language Sciences, Utrecht University
Florian Kunneman: Dept. Computer Science, Social AI, VU University Amsterdam
Christine Liebrecht: Tilburg center for Cognition and Communication, Tilburg University
Yvette Linders: Centre for Language Studies, Radboud University
Alfons Maes: Tilburg center for Cognition and Communication, Tilburg University

DOI: https://doi.org/10.51751/dujal16248
Journal volume & issue: Vol. 13

Abstract

Read online

Many research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagreement among raters. In this paper, we discuss causes of and solutions to disagreement problems in the analysis of discourse. We discuss crucial factors determining the quality and outcome of corpus analyses, and focus on the sometimes tense relation between reliability and validity. We evaluate formal assessments of intercoder reliability. We suggest a number of ways to improve the intercoder reliability, such as the precise specification of the variables and their coding categories and carving up the coding process into smaller substeps. The paper ends with a reflection on challenges for future work in discourse analysis, with special attention to big data and multimodal discourse.

Published in Dutch Journal of Applied Linguistics

ISSN: 2211-7245 (Print); 2211-7253 (Online)
Publisher: openjournals.nl
Country of publisher: Netherlands
LCC subjects: Language and Literature: Philology. Linguistics
Website: https://dujal.nl/index

About the journal

Abstract

Keywords