Code4Lib Journal (Jul 2017)

New Metadata Recipes for Old Cookbooks: Creating and Analyzing a Digital Collection Using the HathiTrust Research Center Portal

  • Gioia Stevens

Journal volume & issue
no. 37

Abstract

Read online

The Early American Cookbooks digital project is a case study in analyzing collections as data using HathiTrust and the HathiTrust Research Center (HTRC) Portal. The purposes of the project are to create a freely available, searchable collection of full-text early American cookbooks within the HathiTrust Digital Library, to offer an overview of the scope and contents of the collection, and to analyze trends and patterns in the metadata and the full text of the collection. The digital project has two basic components: a collection of 1450 full-text cookbooks published in the United States between 1800 and 1920 and a website to present a guide to the collection and the results of the analysis. This article will focus on the workflow for analyzing the metadata and the full-text of the collection. The workflow will cover: 1) creating a searchable public collection of full-text titles within the HathiTrust Digital Library and uploading it to the HTRC Portal, 2) analyzing and visualizing legacy MARC data for the collection using MarcEdit, OpenRefine and Tableau, and 3) using the text analysis tools in the HTRC Portal to look for trends and patterns in the full text of the collection.