Libellarium: Journal for the Research of Writing, Books, and Cultural Heritage Institutions (Mar 2017)
A decision support system to facilitate file format selection for digital preservation
Abstract
This paper presents a method to facilitate decision making for the preservation of digital content in libraries and archives using institutional risk profiles that highlight endangered files formats (in danger of becoming inaccessible or unusable). The primary contribution of this work is the combined use of both machine-mined data and human-expert input to select and configure institution-specific preservation risk profiles. The machine-mined data used the developed File Format Metadata Aggregator (FFMA), and the crowdsourced expert input was collected via two surveys of digital preservation practitioners. A by-product of this endeavor is the ability to visualize risk factors for analysis. The underlying decision support system used the Cosine Similarity algorithm to provide recommendations for matching risk profiles to selected institutional risk settings. This method improves the interpretability of risk factor values and the quality of a digital preservation process. The aggregated information about the risk factors is presented as a multidimensional vector that shows a particular analysis focus and its resulting impact on selected file formats. Sample risk profile calculations and the visualization of risk factor dimensions are shared in the evaluation section.
Keywords