Faster NTRU on ARM Cortex-M4 With TMVP-Based Multiplication

Paksoy, İREM; CENK, MURAT

doi:10.1109/tcsi.2022.3191111

Faster NTRU on ARM Cortex-M4 With TMVP-Based Multiplication

Paksoy İ., CENK M.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, cilt.69, sa.10, ss.4083-4092, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 69 Sayı: 10
Basım Tarihi: 2022
Doi Numarası: 10.1109/tcsi.2022.3191111
Dergi Adı: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
Sayfa Sayıları: ss.4083-4092
Anahtar Kelimeler: Arithmetic, Complexity theory, NIST, Quantum computing, Encryption, Transforms, Standardization, Lattice-based, post-quantum, ARM Cortex-M4, NTRU, Toeplitz, TMVP, COMPLEXITY, COMPUTATION, MULTIPLIERS
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

This paper focuses on speeding up NTRU -one of the lattice-based finalists of the NIST PQC competition -by improving the ring multiplication. The Number Theoretic Transform (NTT), Toom-Cook, and Karatsuba are the most commonly used algorithms for implementing NTRU. In this paper, we propose Toeplitz matrix-vector product (TMVP) based algorithms for multiplication for all parameter sets of NTRU. We implement the proposed algorithms on ARM Cortex-M4. The results show that the TMVP-based multiplication algorithms we propose are more efficient than the others in the literature in most cases. Our algorithm for ntruhps2048509 outperform the Toom-Cook and NTT methods in the literature by 25.4% and 21.5%. We also observe the impact of these improvements on the overall performance of NTRU. We speed up the key generation, encryption, decryption, encapsulation, and decapsulation algorithms of ntruhps2048509 by 12.5%, 14.3%, 17.7%, 3.9%, and 14.7%, respectively, compared to state-of-the-art implementation. Moreover, our algorithms require less stack space than the others.