International Journal of Digital Curation (Dec 2012)
Digital Forensics Formats: Seeking a Digital Preservation Storage Container Format for Web Archiving
Abstract
In this paper we discuss archival storage container formats from the point of view of digital curation and preservation, an aspect of preservation overlooked by most other studies. Considering established approaches to data management as our jumping off point, we selected seven container format attributes that are core to the long term accessibility of digital materials. We have labeled these core preservation attributes. These attributes are then used as evaluation criteria to compare storage container formats belonging to five common categories: formats for archiving selected content (e.g. tar, WARC), disk image formats that capture data for recovery or installation (partimage, dd raw image), these two types combined with a selected compression algorithm (e.g. tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for data analysis in criminal investigations (e.g. aff – Advanced Forensic File format). We present a general discussion of the storage container format landscape in terms of the attributes we discuss, and make a direct comparison between the three most promising archival formats: tar, WARC, and aff. We conclude by suggesting the next steps to take the research forward and to validate the observations we have made.