Code4Lib Journal (Nov 2009)

Automated Metadata Formatting for Cornell’s Print-on-Demand Books

  • Dianne Dietrich

Journal volume & issue
no. 8
p. 2138

Abstract

Read online

Cornell University Library has made Print-On Demand (POD) books available for many of its digitized out-of-copyright books. The printer must be supplied with metadata from the MARC bibliographic record in order to produce book covers. Although the names of authors are present in MARC records, they are given in an inverted order suitable for alphabetical filing rather than the natural order that is desirable for book covers. This article discusses a process for parsing and manipulating the MARC author strings to identify their various component parts and to create natural order strings. In particular, the article focuses on processing non-name information in author strings, such as titles that were commonly used in older works, e.g., baron or earl, and suffixes appended to names, e.g., "of Bolsena." Relevant patterns are identified and a Python script is used to manipulate the author name strings.