Information (Aug 2023)

Exploring a Multi-Layered Cross-Genre Corpus of Document-Level Semantic Relations

  • Gregor Williamson,
  • Angela Cao,
  • Yingying Chen,
  • Yuxin Ji,
  • Liyan Xu,
  • Jinho D. Choi

DOI
https://doi.org/10.3390/info14080431
Journal volume & issue
Vol. 14, no. 8
p. 431

Abstract

Read online

This paper introduces a multi-layered cross-genre corpus, annotated for coreference resolution, causal relations, and temporal relations, comprising a variety of genres, from news articles and children’s stories to Reddit posts. Our results reveal distinctive genre-specific characteristics at each layer of annotation, highlighting unique challenges for both annotators and machine learning models. Children’s stories feature linear temporal structures and clear causal relations. In contrast, news articles employ non-linear temporal sequences with minimal use of explicit causal or conditional language and few first-person pronouns. Lastly, Reddit posts are author-centered explanations of ongoing situations, with occasional meta-textual reference. Our annotation schemes are adapted from existing work to better suit a broader range of text types. We argue that our multi-layered cross-genre corpus not only reveals genre-specific semantic characteristics but also indicates a rich contextual interplay between the various layers of semantic information. Our MLCG corpus is shared under the open-source Apache 2.0 license.

Keywords