Informatics in Medicine Unlocked (Jan 2021)

Privacy-preserving string search on encrypted genomic data using a generalized suffix tree

  • Md Safiur Rahman Mahdi,
  • Md Momin Al Aziz,
  • Noman Mohammed,
  • Xiaoqian Jiang

Journal volume & issue
Vol. 23
p. 100525

Abstract

Read online

Background and objective: Efficient sequencing technologies generate a plethora of genomic data and make it available to researchers. To compute a massive genomic dataset, outsourcing the data to the cloud is often required. Before outsourcing, data owners encrypt sensitive data to ensure data confidentiality. Outsourcing helps data owners to eliminate the local storage management problem. Since genome data is large in volume, executing researchers’ queries securely and efficiently is challenging. Methods: In this paper, we propose a method to securely perform substring search and set-maximal search on a SNPs dataset using a generalized suffix tree. The proposed method guarantees the following: (1) data privacy, (2) query privacy, and (3) output privacy. It adopts the semi-honest adversary model, and the security of the data is guaranteed through encryption and garbled circuits. Results: Our experimental results demonstrate that our proposed method can compute secure substring and set-maximal searches against a single-nucleotide polymorphism (SNPs) dataset of 2184 records (each record contains 10000 SNPs) in 2.3 and 2 s, respectively. Furthermore, we compared our results with existing techniques of secure substring and set-maximal search (Ishimaki et al., 2016; Shimizu et al., 2016) [1,2], and we achieved a 400 and 2 times improvement, respectively (Table 5).

Keywords