IEEE Access (Jan 2018)
Privacy-Preserving Compressed Reference-Oriented Alignment Map Using Decentralized Storage
Abstract
In bioinformatics, researchers have endeavored to resolve the following two issues: 1) how to increase the efficiency of storage through compression and 2) how to provide confidentiality for the genome sequence data. To resolve two issues, the sequence alignment map, the binary alignment map, the compressed reference-oriented alignment map (CRAM), and the selective retrieval on encrypted and CRAM formats were proposed. However, since these formats are stored in a centralized storage that is managed by the genome testing organizations, the privacy of sensitive genome sequence data is not guaranteed. In this paper, we propose a new compressed reference-oriented alignment map, called decentralized storage and compressed reference-oriented alignment map (D-RAM), which preserves the privacy of genome sequence data using a decentralized storage architecture. The proposed D-RAM format uses the reference-based compression and bzip2 compression to use storage space efficiently. In addition, to preserve the privacy of genome sequence data, the proposed decentralized storage architecture is designed to store the private genome sequence data and the public genome sequence data separately. From the experimental results under simulation and real genome sequence data, we show that the D-RAM format saves the size of the genome sequence data than other formats. By analyzing the computational complexity with which the attacker recovers the genome sequence data, we also show the theoretical analysis that explains why the D-RAM format is safer than the other formats.
Keywords