International Journal of Digital Curation (Mar 2011)
Automation of Flexible Migration Workflows
Abstract
Many digital preservation scenarios are based on the migration strategy, which itself is heavily tool-dependent. For popular, well-defined and often open file formats – e.g., digital images, such as PNG, GIF, JPEG – a wide range of tools exist. Migration workflows become more difficult with proprietary formats, as used by the several text processing applications becoming available in the last two decades. If a certain file format can not be rendered with actual software, emulation of the original environment remains a valid option. For instance, with the original Lotus AmiPro or Word Perfect, it is not a problem to save an object of this type in ASCII text or Rich Text Format. In specific environments, it is even possible to send the file to a virtual printer, thereby producing a PDF as a migration output. Such manual migration tasks typically involve human interaction, which may be feasible for a small number of objects, but not for larger batches of files.We propose a novel approach using a software-operated VNC abstraction layer in order to replace humans with machine interaction. Emulators or virtualization tools equipped with a VNC interface are very well suited for this approach. But screen, keyboard and mouse interaction is just part of the setup. Furthermore, digital objects need to be transferred into the original environment in order to be extracted after processing. Nevertheless, the complexity of the new generation of migration services is quickly rising; a preservation workflow is now comprised not only of the migration tool itself, but of a complete software and virtual hardware stack with recorded workflows linked to every supported migration scenario. Thus the requirements of OAIS management must include proper software archiving, emulator selection, system image and recording handling. The concept of view-paths could help either to automatically determine the proper pre-configured virtual environment or to set up system images for certain migration workflows. View-paths may rise in demand, as the generation of PDF output files from Word Perfect input could be cached as pre-fabricated emulator system images. The current groundwork provides several possible optimizations, such as using the automation features of the original environments.