IEEE Access (Jan 2024)
A Flexible and Parallel Hardware Accelerator for Forward and Inverse Number Theoretic Transform
Abstract
This paper demonstrates an efficient and flexible hardware accelerator for polynomial multiplication using number theoretic transform (NTT). The proposed architecture considers flexibility and performance requirements at the same time. Flexibility is achieved by computing the following three operations: (i) computing only the forward NTT operation using a Cooley-Tukey butterfly unit (CT-BFU), (ii) computing only the inverse NTT operation using a Gentleman-Sande butterfly unit (GS-BFU), and (iii) computing both forward and inverse NTT operations simultaneously. The performance is enhanced by supporting parallelism between one CT-BFU unit, one GS-BFU unit, and four Block-RAMs. Moreover, a dedicated control unit is implemented to ensure a flexible and parallel FP-NTT design. A throughput/area metric is used for evaluation of performance for the proposed design. The implementation results are presented after post-placement and route on various Xilinx field-programmable gate array (FPGA) devices. Specifically, on Virtex-7 FPGA, FP-NTT operates at a frequency of $250MHz$ , utilising 1026 slices, and requires $4.61\mu s$ and $5.12\mu s$ for forward and inverse NTT computations, respectively. The calculated throughput/area is 211.41 and 190.36 for forward and inverse computations, respectively. A comparison with state-of-the-art designs emphasises the suitability of the FP-NTT accelerator for high-speed cryptographic applications.
Keywords