Code4Lib Journal (May 2022)

Fractal in detail: What information is in a file format identification report?

  • Ross Spencer

Journal volume & issue
no. 53

Abstract

Read online

A file format identification report, such as those generated by digital preservation tools, DROID, Siegfried, or FIDO, contain an incredible wealth of information. Used to scan discrete sets of files comprising a part of, or the entirety of a digital collection, these datasets can serve as entry points for further activities including appraisal, identification of future work efforts, and the facilitation of transfer of digital objects into preservation storage. The information contained in them is fractal in detail and there are numerous outputs that can be generated from that detail. This paper describes the purpose of a file format identification report and the extensive information that can be extracted from one. It summarizes a number of ways of transforming them into the inputs for other systems and describes a handful of the tools already doing so. The paper concludes that describing a format identification report is a pivotal artefact in the digital transfer process, and asks the reader to consider how they might leverage them and the benefits doing so might provide.