Journal of Open Humanities Data (Nov 2017)

Annotated References in the Historiography on Venice: 19th–21st centuries

  • Giovanni Colavizza,
  • Matteo Romanello

DOI
https://doi.org/10.5334/johd.9
Journal volume & issue
Vol. 3

Abstract

Read online

We publish a dataset containing more than 40’000 manually annotated references from a broad corpus of books and journal articles on the history of Venice. References were considered from both reference lists and footnotes, include primary and secondary sources, in full or abbreviated form. The dataset comprises references from publications from the 19th to the 21st century. References were collected from a newly digitized corpus and manually annotated in all their constituent parts. The dataset is stored on a GitHub repository, persisted in Zenodo, and it is accompanied with code to train parsers in order to extract references from other publications. Two trained Conditional Random Fields models are provided along with their evaluation, in order to act as a baseline for a parsing shared task. No comparable public dataset exists to support the task of reference parsing in the humanities. The dataset is of interest to all working on the domain of reference parsing and citation extraction in the humanities. Funding Statement: The project is supported by the Swiss National Fund, with grants 205121_159961 and P1ELP2_168489.

Keywords