Zeitschrift für digitale Geisteswissenschaften (Nov 2022)

Gute Wörter, schwaches Gattungssignal. Differenzen zwischen Roman-Subgenres und Dramen mit Delta und signifikantem Wortschatz aufspüren

  • Friedrich Michael Dimpel

DOI
https://doi.org/10.17175/2022_009_v2

Abstract

Read online

It is investigated to what extent the automatic recognition of genres or subgenres by means of Burrows’ Delta can be improved by significant vocabulary (›good words‹) and Z-value limitation. On one subcorpus, ›good words‹ are determined on the genres adventure novel, Bildungsromans, social novel, comedy, and tragedy; on a second subcorpus, they are evaluated. For all five text types, the F1 values increase due to these optimization measures, for example from 0.65 to 0.77. For adventure novel, Bildungsroman and comedy, the F1 values increase, for example, from 0.79 to 0.91. The classification of adventure novel versus drama and of comedy versus adventure and Bildungsroman succeeds without errors (ARI=1). While the ›good word procedure‹ increases recall, the Z-score limitation limits false positives.

Keywords