Journal of eScience Librarianship (Mar 2024)
Using AI/Machine Learning to Extract Data from Japanese American Confinement Records
Abstract
Purpose: This paper examines the use of Artificial Intelligence/Machine Learning to extract a more comprehensive data set from a structured “standardized” form used to document Japanese American incarcerees during World War II.Setting/Participants/Resources: The Bancroft Library partnered with Densho, a community memory organization, and Doxie.AI to complete this work. Brief Description: The project digitized the complete set of Form WRA-26 “individual records'' for more than 110,000 Japanese Americans incarcerated in War Relocation Authority camps during WWII. The library utilized AI/machine learning to automate text extraction from over 220,000 images of a structured “standardized” form; our goal was to improve upon and collect information not previously recorded in the Japanese American Internee Data file held by the National Archives and Records Administration. The project team worked with technical, academic, legal, and community partners to address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records. Results/Outcome: Using AI/machine learning increased the quality of the data extracted from the digitized WWII era forms. Evaluation Method: A comparison of the earlier dataset extracted from the 1940s’s computer punch cards to the current data set extracted using AI/machine learning, the use of AI/machine learning showed marked improvement.
Keywords