IEEE Access (Jan 2021)

Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization

  • Wonkyung Jung,
  • Eojin Lee,
  • Sangpyo Kim,
  • Jongmin Kim,
  • Namhoon Kim,
  • Keewoo Lee,
  • Chohong Min,
  • Jung Hee Cheon,
  • Jung Ho Ahn

DOI
https://doi.org/10.1109/ACCESS.2021.3096189
Journal volume & issue
Vol. 9
pp. 98772 – 98789

Abstract

Read online

Homomorphic Encryption (HE) has drawn significant attention as a privacy-preserving approach for cloud computing because it allows computation on encrypted messages called ciphertexts. Among the numerous HE schemes proposed thus far, HE for Arithmetic of Approximate Numbers (HEAAN) is rapidly gaining in popularity across a wide range of applications, as it supports messages that can tolerate approximate computations with no limit on the number of arithmetic operations applicable to the ciphertexts. A critical shortcoming of HE is the high computation complexity of ciphertext arithmetic; specifically, HE multiplication (HE Mul) is more than 10,000 times slower than the corresponding multiplication between unencrypted messages. This has led to a large body of HE acceleration studies, including those that exploit FPGAs; however, a rigorous analysis of the computational complexity and data access patterns of HE Mul is lacking. Moreover, the proposals mostly focused on designs with small parameter sizes, making it difficult accurately to estimate the performance of the HE accelerators when conducting a series of complex arithmetic operations. In this paper, we first describe how HE Mul of HEAAN is performed in a manner friendly to non-crypto experts. Then, we conduct a disciplined analysis of its computational and memory-access characteristics, through which we (1) extract parallelism in the key functions composing HE Mul and (2) demonstrate how to map the parallelism effectively to popular parallel processing platforms, CPUs and GPUs, by applying a series of optimizations such as transposing matrices and pinning data to threads. This leads to performance improvements of HE Mul on a CPU and a GPU by $2.06\times $ and $4.05\times $ , respectively, over the reference HEAAN running on a CPU with 24 threads.

Keywords