Jisuanji kexue yu tansuo (Feb 2024)

Hardware Acceleration of Number Theoretic Transform in zk-SNARK

  • ZHAO Haixu, CHAI Zhilei, HUA Pengcheng, WANG Feng, DING Dong

DOI
https://doi.org/10.3778/j.issn.1673-9418.2211075
Journal volume & issue
Vol. 18, no. 2
pp. 538 – 552

Abstract

Read online

The proof in zk-SNARK has a fixed length and can be verified quickly, promoting the application of zero-knowledge proof in areas such as digital signature, blockchain, distributed storage, and outsourced computing. However, the generation of proofs is time-consuming and frequently used. As a result, NTT (number theoretic transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. However, the existing general NTT hardware acceleration methods cannot meet the requirements of large-bitwidth and large-scale in zk-SNARK. To address this issue, this paper proposes a highly pipelined architecture for NTT. First of all, large-bitwidth modular arithmetic is optimized and low-latency Montgomery modular multiplication hardware unit is designed. And then, the large-scale NTT tasks are divided into smaller sub-tasks through two-dimensional partitioning, which improves the parallelism of NTT computation and eliminates the data dependence among sub-tasks, thus reali-zing the pipeline among sub-tasks. Finally, the “data reordering” technique is introduced among multiple rounds of butterfly operations in a sub-task, which effectively alleviates the memory access requirements, thus realizing the bottom-level pipeline in each sub-task, among butterfly operations with different step sizes. This architecture can be flexibly scaled to different scales of FPGAs. The accelerator is prototyped on the AMD-Xilinx Alveo U50 card (UltraScale+XCU50 FPGA). To balance computing efficiency and flexibility, the OpenCL equipped with high-level synthesis (HLS) is used to implement the system. The evaluation results show that the NTT module performs 1.95 times faster than the one in PipeZK and the accelerator achieves 27.98 and 1.74 times speedup, 6.9 and 6 times energy efficiency improvement than AMD Ryzen 9 5900X respectively, when it is integrated into the well-known ZKP open-source project, bellman.

Keywords