PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

Sara Jafarbeiki; Amin Sakzad; Shabnam Kasra Kermanshahi; Raj Gaire; Ron Steinfeld; Shangqi Lai; Gad Abraham; Chandra Thapa

Informatics in Medicine Unlocked (Jan 2022)

PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

Sara Jafarbeiki,
Amin Sakzad,
Shabnam Kasra Kermanshahi,
Raj Gaire,
Ron Steinfeld,
Shangqi Lai,
Gad Abraham,
Chandra Thapa

Affiliations

Sara Jafarbeiki: Faculty of Information Technology, Monash University, Clayton Campus, Melbourne, Australia; CSIRO DATA61, Australia; Corresponding author at: Faculty of Information Technology, Monash University, Clayton Campus, Melbourne, Australia.
Amin Sakzad: Faculty of Information Technology, Monash University, Clayton Campus, Melbourne, Australia
Shabnam Kasra Kermanshahi: The School of Computing Technologies, RMIT University, Melbourne, Australia
Raj Gaire: CSIRO DATA61, Australia
Ron Steinfeld: Faculty of Information Technology, Monash University, Clayton Campus, Melbourne, Australia
Shangqi Lai: Faculty of Information Technology, Monash University, Clayton Campus, Melbourne, Australia; CSIRO DATA61, Australia
Gad Abraham: The Systems Genomics Laboratory, Baker Institute, Melbourne, Australia
Chandra Thapa: CSIRO DATA61, Australia

Journal volume & issue: Vol. 31
p. 100988

Abstract

Read online

Privacy and security issues limit the query executions over genomics datasets, notably single nucleotide polymorphisms (SNPs), raised by the sensitivity of this type of data. Therefore, it is important to ensure that executing queries on these datasets do not reveal sensitive information, such as the identity of the individuals and their genetic traits, to a data server. In this paper, we propose and present a novel model, we call PrivGenDB, to ensure the confidentiality of SNP-phenotype data while executing queries. The confidentiality in PrivGenDB is enabled by its system architecture and the search functionality provided by searchable symmetric encryption (SSE). To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of SNP-phenotype data as the current SSE-based approaches for genomic data are limited only to substring search and range queries on a sequence of genomic data. Besides, a new data encoding mechanism is proposed and incorporated in the PrivGenDB model. This enables PrivGenDB to handle the dataset containing both genotype and phenotype and also support storing and managing other metadata, like gender and ethnicity, privately. Furthermore, different queries, namely Count, Boolean, Negation and k ′ -out-of-k match queries used for genomic data analysis, are supported and executed by PrivGenDB. The execution of these queries on genomic data in PrivGenDB is efficient and scalable for biomedical research and services. These are demonstrated by our analytical and empirical analysis presented in this paper. Specifically, our empirical studies on a dataset with 5000 entries (records) containing 1000 SNPs demonstrate that a count/Boolean query and a k ′ -out-of-k match query over 40 SNPs take approximately 4 . 3 s and 86 . 4 μ s , respectively, outperforming the existing schemes.

Published in Informatics in Medicine Unlocked

ISSN: 2352-9148 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/informatics-in-medicine-unlocked/

About the journal

Abstract

Keywords