A Configurable RO-PUF for Securing Embedded Systems Implemented on Programmable Devices M.C. Martínez-Rodríguez, E. Camacho-Ruiz, P. Brox and S. Sánchez-Solano Journal Paper · Electronics, vol. 10, no. 16, article 1957, 2021 resumendoipdf
Improving the security of electronic devices that support innovative critical services (digital administrative services, e-health, e-shopping, and on-line banking) is essential to lay the foundations of a secure digital society. Security schemes based on Physical Unclonable Functions (PUFs) take advantage of intrinsic characteristics of the hardware for the online generation of unique digital identifiers and cryptographic keys that allow to ensure the protection of the devices against counterfeiting and to preserve data privacy. This paper tackles the design of a configurable Ring Oscillator (RO) PUF that encompasses several strategies to provide an efficient solution in terms of area, timing response, and performance. RO-PUF implementation on programmable logic devices is conceived to minimize the use of available resources, while operating speed can be optimized by properly selecting the size of the elements used to obtain the PUF response. The work also describes the interface added to the PUF to facilitate its incorporation as hardware Intellectual Property (IP)-modules into embedded systems. The performance of the RO-PUF is proven with an extensive battery of tests, which are executed to analyze the influence of different test strategies on the PUF quality indexes. The configurability of the proposed RO-PUF allows establishing the most suitable ‘cost/performance/security-level’ trade-off for a certain application.
Timing-Optimized Hardware Implementation to Accelerate Polynomial Multiplication in the NTRU Algorithm E. Camacho-Ruiz, S. Sánchez-Solano, P. Brox and M.C. Martínez-Rodríguez Journal Paper · ACM Journal on Emerging Technologies in Computing Systems, vol. 17, no. 3, article 35, 2021 resumendoi
Post-quantum cryptographic algorithms have emerged to secure communication channels between electronic devices faced with the advent of quantum computers. The performance of post-quantum cryptographic algorithms on embedded systems has to be evaluated to achieve a good trade-off between required resources (area) and timing. This work presents two optimized implementations to speed up the NTRUEncrypt algorithm on a system-on-chip. The strategy is based on accelerating the most time-consuming operation that is the truncated polynomial multiplication. Hardware dedicated modules for multiplication are designed by exploiting the presence of consecutive zeros in the coefficients of the blinding polynomial. The results are validated on a PYNQ-Z2 platform that includes a Zynq-7000 SoC from Xilinx and supports a Python-based programming environment. The optimized version that exploits the presence of double, triple, and quadruple consecutive zeros offers the best performance in timing, in addition to considerably reducing the possibility of an information leakage against an eventual attack on the device, making it practically negligible.
Accelerating the Development of NTRU Algorithm on Embedded Systems E. Camacho-Ruiz, M.C. Martínez-Rodríguez, S. Sánchez-Solano and P. Brox Conference · Conference on Design of Circuits and Integrated Systems DCIS 2020 resumen
The advent of quantum computers represents a serious threat to current public key cryptosystems. To face this problem the so-called Post-Quantum (PQ) cryptographic solutions are being developed, many of which have been presented to the competition launched by NIST to evaluate proposals of PQ cryptography for standardization and deployment. This paper addresses the implementation of the NTRU PQ cryptographic algorithm on embedded systems. Using a Python-based development framework to accelerate the design process, software-only and hybrid (HW/SW) implementations of NTRU are evaluated in terms of operation speed and resource consumption on a System-on-Chip (SoC). Results show that hardware implementation of critical operations in conjuction with a Python+C programming allows an increase in performance that ranges from 130 to 450 depending on the selected scenario to use the algorithm.