Polynomial multiplication on embedded vector architectures

Hanno Becker; Jose Maria Bermudo Mera; Angshuman Karmakar; Joseph Yiu; Ingrid Verbauwhede

doi:10.46586/tches.v2022.i1.482-505

Transactions on Cryptographic Hardware and Embedded Systems (Nov 2021)

Polynomial multiplication on embedded vector architectures

Hanno Becker,
Jose Maria Bermudo Mera,
Angshuman Karmakar,
Joseph Yiu,
Ingrid Verbauwhede

Affiliations

Hanno Becker: Arm Ltd, Cambridge, UK
Jose Maria Bermudo Mera: imec-COSIC, KU Leuven Kasteelpark Arenberg 10, Bus 2452, B-3001 Leuven-Heverlee, Belgium
Angshuman Karmakar: imec-COSIC, KU Leuven Kasteelpark Arenberg 10, Bus 2452, B-3001 Leuven-Heverlee, Belgium
Joseph Yiu: Arm Ltd, Cambridge, UK
Ingrid Verbauwhede: imec-COSIC, KU Leuven Kasteelpark Arenberg 10, Bus 2452, B-3001 Leuven-Heverlee, Belgium

DOI: https://doi.org/10.46586/tches.v2022.i1.482-505
Journal volume & issue: Vol. 2022, no. 1

Abstract

Read online

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber

Published in Transactions on Cryptographic Hardware and Embedded Systems

ISSN: 2569-2925 (Online)
Publisher: Ruhr-Universität Bochum
Country of publisher: Germany
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://tches.iacr.org

About the journal

Abstract

Keywords