IEEE Access (Jan 2024)

Configurable Arithmetic Core Architecture for RNS-CKKS Homomorphic Encryption

  • Chulwoo Lee,
  • Hanyoung Lee,
  • Ardianto Satriawan,
  • Hanho Lee

DOI
https://doi.org/10.1109/ACCESS.2024.3473903
Journal volume & issue
Vol. 12
pp. 147220 – 147234

Abstract

Read online

Fully Homomorphic Encryption (FHE) provides privacy-preserving applications due to its ability to perform arithmetic computations such as addition and multiplication on encrypted data without decrypting them first. However, there are bottlenecks to its practical applications because of its large data size, significant computational power, and memory usage requirements. One of the bottlenecks is key-switching, which is required when performing homomorphic multiplications. In the CKKS scheme, when multiplying two ciphertexts. Initially, both ciphertexts consist of two polynomial elements multiplied by dyadic multiplication. Consequently, the resulting ciphertext consists of three elements. An operation known as key-switching is required to relinearize the ciphertext from three to two elements and make it decryptable with the initial secret key. However, it is a computationally expensive operation, with the number theoretic transform (NTT) and its inverse (INTT) being the most time and resource-consuming parts. To address the problem, this technical report proposes a configurable arithmetic core (CAC) hardware accelerator that can be used for key-switching in the CKKS scheme. Our architecture offers a configurable arithmetic core that can be configured for NTT, INTT, and multiply-and-accumulate (MAC) operations. We implemented our design in the AMD Xilinx Alveo U250 FPGA platform. We then use this architecture to perform key-switching operations in the CKKS scheme. As a $2^{16}$ NTT/INTT accelerator, our design performs, when compared to classical architecture, our design performs 11.33 times faster. Meanwhile, compared to the state-of-the-art architecture, it performs 1.07 times faster. Our design can also run at a higher frequency than others. As a key-switching accelerator, compared to the CPU implementation by OpenFHE, our design implementation in FPGA gains about 1600 to 2700 times speed-up. Compared to other FPGA design, our key-switching accelerator offers more configurability on the multiplicative level.

Keywords