IEEE Access (Jan 2024)

An Efficient Hardware/Software Co-Design for FALCON on Low-End Embedded Systems

  • Yongseok Lee,
  • Jonghee Youn,
  • Kevin Nam,
  • Heon Hui Jung,
  • Myunghyun Cho,
  • Jimyung Na,
  • Jong-Yeon Park,
  • Seungsu Jeon,
  • Bo Gyeong Kang,
  • Hyunyoung Oh,
  • Yunheung Paek

DOI
https://doi.org/10.1109/ACCESS.2024.3387489
Journal volume & issue
Vol. 12
pp. 57947 – 57958

Abstract

Read online

We propose in this paper an efficient FALCON accelerator called EFX based on a HW/SW co-design where FALCON is a post-quantum cryptographic (PQC) scheme tailored as a digital signature algorithm (DSA). Our findings reveal that FALCON exhibits unique characteristics and structures which distinguish it from other PQC-DSAs. A key finding is that, unlike its counterparts, FALCON doesn’t prioritize a single, time-consuming task; instead, it processes a variety of tasks with comparable execution times. Consequently, the conventional methods focusing on accelerating dominant few tasks, which are generally effective for other algorithms, prove less efficient for FALCON, especially concerning the minimization of the silicon area used. To overcome this, we strategically focus on the granular optimization of lower-level operations rather than on broader functional segments, aiming to boost performance while conserving hardware space. Moreover, to mitigate the potential degradation due to limitation of hardware resources, we have implemented a pipelined execution strategy for the FALCON functions and refined the sampling function–a critical task that is challenging to accelerate due to inherent sequential algorithm–enabling it to run concurrently on both software and hardware, thus reducing latency. Our hardware design, synthesized at $300MHz$ using Samsung’s $28nm$ and $45nm$ process technologies, demonstrates superior performance in generating FALCON signatures, with a $3.58 \times $ improvement in clock cycles over an existing hardware accelerator. EFX occupies 38K $um ^{2}$ and 74K $um ^{2}$ for $28nm$ and $45nm$ processes, respectively, comparatively small compared to other PQC accelerators.

Keywords