Cryptography (Aug 2021)

Implementing Privacy-Preserving Genotype Analysis with Consideration for Population Stratification

  • Andre Ostrak,
  • Jaak Randmets,
  • Ville Sokk,
  • Sven Laur,
  • Liina Kamm

DOI
https://doi.org/10.3390/cryptography5030021
Journal volume & issue
Vol. 5, no. 3
p. 21

Abstract

Read online

In bioinformatics, genome-wide association studies (GWAS) are used to detect associations between single-nucleotide polymorphisms (SNPs) and phenotypic traits such as diseases. Significant differences in SNP counts between case and control groups can signal association between variants and phenotypic traits. Most traits are affected by multiple genetic locations. To detect these subtle associations, bioinformaticians need access to more heterogeneous data. Regulatory restrictions in cross-border health data exchange have created a surge in research on privacy-preserving solutions, including secure computing techniques. However, in studies of such scale, one must account for population stratification, as under- and over-representation of sub-populations can lead to spurious associations. We improve on the state of the art of privacy-preserving GWAS methods by showing how to adapt principal component analysis (PCA) with stratification control (EIGENSTRAT), FastPCA, EMMAX and the genomic control algorithm for secure computing. We implement these methods using secure computing techniques—secure multi-party computation (MPC) and trusted execution environments (TEE). Our algorithms are the most complex ones at this scale implemented with MPC. We present performance benchmarks and a security and feasibility trade-off discussion for both techniques.

Keywords