'By the People' Crowdsourcing Datasets from the Library of Congress

Victoria Van Hyning; Lauren Algee; Mason Jones; Carlyn Osborn; Trevor Owens; Lauren Seroka; Abby Shelton

doi:10.5334/johd.67

Journal of Open Humanities Data (Feb 2022)

'By the People' Crowdsourcing Datasets from the Library of Congress

Victoria Van Hyning,
Lauren Algee,
Mason Jones,
Carlyn Osborn,
Trevor Owens,
Lauren Seroka,
Abby Shelton

Affiliations

Victoria Van Hyning: College of Information Studies, University of Maryland, College Park
Lauren Algee: Digital Content Management Section, Library of Congress, Washington, DC
Mason Jones: College of Information Studies, University of Maryland, College Park
Carlyn Osborn: Digital Content Management Section, Library of Congress, Washington, DC
Trevor Owens: Digital Content Management Section, Library of Congress, Washington, DC
Lauren Seroka: Digital Content Management Section, Library of Congress, Washington, DC
Abby Shelton: Digital Content Management Section, Library of Congress, Washington, DC

DOI: https://doi.org/10.5334/johd.67
Journal volume & issue: Vol. 8

Abstract

Read online

The 'By the People' ('BTP') datasets comprise text of selected collections of the Library of Congress (LOC) created by volunteers in the 'By the People' crowdsourced transcription program, which invites public transcription of historical documents. All transcriptions are created and reviewed by volunteers in a consensus-based model in which two or more volunteers must agree on a transcription for it to be considered complete. Resulting transcriptions are added to the digital collections alongside the images to enable search and accessibility of the collections. Additionally, completed transcription “campaigns” are published as freely downloadable datasets of .CSV files containing all campaign transcriptions, as well as minimal metadata. The datasets can support a multitude of purposes including computational research in fields such as history, linguistics, economics, and political science.

Published in Journal of Open Humanities Data

ISSN: 2059-481X (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: General Works: History of scholarship and learning. The humanities; Language and Literature
Website: https://openhumanitiesdata.metajnl.com/

About the journal

Abstract

Keywords