IEEE Access (Jan 2024)

High-Speed NTT Accelerator for CRYSTAL-Kyber and CRYSTAL-Dilithium

  • Trong-Hung Nguyen,
  • Binh Kieu-Do-Nguyen,
  • Cong-Kha Pham,
  • Trong-Thuc Hoang

DOI
https://doi.org/10.1109/ACCESS.2024.3371581
Journal volume & issue
Vol. 12
pp. 34918 – 34930

Abstract

Read online

The efficiency of polynomial multiplication execution majorly impacts the performance of lattice-based post-quantum cryptosystems. In this research, we propose a high-speed hardware architecture to accelerate polynomial multiplication based on the Number Theoretic Transform (NTT) in CRYSTAL-Kyber and CRYSTAL-Dilithium. We design a Digital Signal Processing (DSP) architecture for modular multiplication in butterfly and Point-Wise Multiplication (PWM) operations. Our method reduces the critical path delay of an $n$ -bit multiplier to that of a ( $2n$ -2)-bit adder, optimizing both area and speed. These dedicated DSPs are employed in butterfly and PWM operations, completely eliminating the pre-process and post-process of NTT transforms. Furthermore, we introduce a novel unified pipelined architecture for the NTT and Inverse NTT (INTT) transformations of Kyber and Dilithium, with corresponding high-speed (Radix-2) and ultra-high-speed (Radix-4) versions. Lastly, we construct a complete hardware accelerator for polynomial matrix-vector multiplication in Kyber. The Field-Programmable Gate Array (FPGA) implementation results have proven that our designs have significantly improved execution time by $3.4\times $ – $9.6\times $ for the NTT transforms in Dilithium and $1.36\times $ – $34.16\times $ for Kyber polynomial multiplication, compared to previous studies reported to date. Additionally, the hardware footprint results indicate that our proposed architectures exhibit superior hardware performance in Area-Time-Product (ATP), corresponding to a 44%–96% improvement. The proposed architectures are efficient and well-suited for high-performance lattice-based cryptography systems.

Keywords