Scientific Data (Dec 2023)

Towards understanding policy design through text-as-data approaches: The policy design annotations (POLIANNA) dataset

  • Sebastian Sewerin,
  • Lynn H. Kaack,
  • Joel Küttel,
  • Fride Sigurdsson,
  • Onerva Martikainen,
  • Alisha Esshaki,
  • Fabian Hafner

DOI
https://doi.org/10.1038/s41597-023-02801-z
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Despite the importance of ambitious policy action for addressing climate change, large and systematic assessments of public policies and their design are lacking as analysing text manually is labour-intensive and costly. POLIANNA is a dataset of policy texts from the European Union (EU) that are annotated based on theoretical concepts of policy design, which can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset consists of 20,577 annotated spans, drawn from 18 EU climate change mitigation and renewable energy policies. We developed a novel coding scheme translating existing taxonomies of policy design elements to a method for annotating text spans that consist of one or several words. Here, we provide the coding scheme, a description of the annotated corpus, and an analysis of inter-annotator agreement, and discuss potential applications. As understanding policy texts is still difficult for current text-processing algorithms, we envision this database to be used for building tools that help with manual coding of policy texts by automatically proposing paragraphs containing relevant information.