Faster NTRU on ARM Cortex-M4 With TMVP-Based Multiplication


Paksoy İ., CENK M.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, vol.69, no.10, pp.4083-4092, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 69 Issue: 10
  • Publication Date: 2022
  • Doi Number: 10.1109/tcsi.2022.3191111
  • Journal Name: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Page Numbers: pp.4083-4092
  • Keywords: Arithmetic, Complexity theory, NIST, Quantum computing, Encryption, Transforms, Standardization, Lattice-based, post-quantum, ARM Cortex-M4, NTRU, Toeplitz, TMVP, COMPLEXITY, COMPUTATION, MULTIPLIERS
  • Middle East Technical University Affiliated: Yes

Abstract

This paper focuses on speeding up NTRU -one of the lattice-based finalists of the NIST PQC competition -by improving the ring multiplication. The Number Theoretic Transform (NTT), Toom-Cook, and Karatsuba are the most commonly used algorithms for implementing NTRU. In this paper, we propose Toeplitz matrix-vector product (TMVP) based algorithms for multiplication for all parameter sets of NTRU. We implement the proposed algorithms on ARM Cortex-M4. The results show that the TMVP-based multiplication algorithms we propose are more efficient than the others in the literature in most cases. Our algorithm for ntruhps2048509 outperform the Toom-Cook and NTT methods in the literature by 25.4% and 21.5%. We also observe the impact of these improvements on the overall performance of NTRU. We speed up the key generation, encryption, decryption, encapsulation, and decapsulation algorithms of ntruhps2048509 by 12.5%, 14.3%, 17.7%, 3.9%, and 14.7%, respectively, compared to state-of-the-art implementation. Moreover, our algorithms require less stack space than the others.