PLoS ONE (Jan 2020)

Automated data extraction from historical city directories: The rise and fall of mid-century gas stations in Providence, RI.

  • Samuel Bell,
  • Thomas Marlow,
  • Kai Wombacher,
  • Anina Hitt,
  • Neev Parikh,
  • Andras Zsom,
  • Scott Frickel

DOI
https://doi.org/10.1371/journal.pone.0220219
Journal volume & issue
Vol. 15, no. 8
p. e0220219

Abstract

Read online

The location of defunct environmentally hazardous businesses like gas stations has many implications for modern American cities. To track down these locations, we present the directoreadr code (github.com/brown-ccv/directoreadr). Using scans of Polk city directories from Providence, RI, directoreadr extracts and parses business location data with a high degree of accuracy. The image processing pipeline ran without any human input for 94.4% of the pages we examined. For the remaining 5.6%, we processed them with some human input. Through hand-checking a sample of three years, we estimate that ~94.6% of historical gas stations are correctly identified and located, with historical street changes and non-standard address formats being the main drivers of errors. As an example use, we look at gas stations, finding that gas stations were most common early in the study period in 1936, beginning a sharp and steady decline around 1950. We are making the dataset produced by directoreadr publicly available. We hope it will be used to explore a range of important questions about socioeconomic patterns in Providence and cities like it during the transformations of the mid-1900s.