Journal of Cultural Analytics (Jul 2021)

The Measure of the Archive: The Ro­bustness of Network Analysis in Early Modern Correspondence

  • Yann C. Ryan,
  • Sebastian E. Ahnert

Journal volume & issue
Vol. 6, no. 3

Abstract

Read online

Network analysis of historical correspondence can be a fruitful way to address historical research questions, and has been increasingly used in historical studies over the past decade. As with many areas of quantitative humanities research, the reliability of the results are often called into question, given that such approaches require ’hard data’ as input, yet almost inevitably use datasets with partial or missing records. Other disciplines using network analysis have conducted robustness experiments designed to test the impact of data loss or error on their results. In order to test how this missing data might affect our own area of research, we conducted a number of experiments designed to simulate the impact of the kinds of loss often seen in historical correspondence data, including random document loss, missing years, and errors in the disambiguation and de-duplication process. The results show that most network centrality measures maintain robustness until a very large proportion of the data (60% or more) is removed. Some measures showed a linear change in robustness, while others remained high and then fell off sharply. Only one, transitivity (local clustering coefficient) was significantly impacted throughout. We tested a range of data loss scenarios (random single letters, folio books of manuscript letters, catalogues, and entire years) and a range of commonly used network metrics. In addition, we tested the robustness of more complex network analysis results in the literature that combine several network metrics to highlight individuals in the network, and found that the same types of individuals would have likely been highlighted even with 50% random letter loss. Alongside the article is a web application, built using Shiny, which will calculate robustness measures for a user-uploaded network dataset. We conclude that researchers working with similar historical correspondence datasets might be able to consider network analysis results to be robust in most cases, rather than work on the assumption that missing data would lead to very different findings or results.