Neuromorphic In-Memory RRAM NAND/NOR Circuit Performance Analysis in a CNN Training Framework on the Edge for Low Power IoT

Nagaraj Lakshmana Prabhu; Nagarajan Raghavan

doi:10.1109/ACCESS.2022.3219066

IEEE Access (Jan 2022)

Neuromorphic In-Memory RRAM NAND/NOR Circuit Performance Analysis in a CNN Training Framework on the Edge for Low Power IoT

Nagaraj Lakshmana Prabhu,
Nagarajan Raghavan

Affiliations

Nagaraj Lakshmana Prabhu: ORCiD; Engineering Product Development (EPD) Pillar, Singapore University of Technology and Design, Tampines, Singapore
Nagarajan Raghavan: ORCiD; Engineering Product Development (EPD) Pillar, Singapore University of Technology and Design, Tampines, Singapore

DOI: https://doi.org/10.1109/ACCESS.2022.3219066
Journal volume & issue: Vol. 10
pp. 125112 – 125135

Abstract

Read online

Training a CNN involves computationally intense optimization algorithms to fit the network using a training dataset, to update the network weight for inferencing and then pattern classification. Hence, the application of in-memory computation would enable a highly power-efficient low latency on-the-edge CNN training technique by avoiding the memory-wall created during the external memory read/write operation (for off chip instruction and data transfer). A memory write-verify, and re-program technique can control the RRAM variability. Still, memory verification and re-program is a complex process with additional resources needed for practical implementation of verification circuit. In this study, we have demonstrated a practical (First-in Max-Out) FIMO-based cache memory called Maximum Count Binary Comparator Layer (MCBC), using 1T3R, 1T5R, and 1T7R RRAM structures by using a probability-based accuracy improvement architecture, without the conventional verification process. We constructed 10 layered modified MobileNET with filter size ranging from 32 - 512 and trained with Traffic Sign Recognition Database (TSRD) using a three-tier abstraction simulation learning framework - (1) High level, 10 layered CNN implementation with Python+TensorFlow; (2) Verilog HDL based FP32MUL and FP32ADD (32-bits Floating Point adder and multiplier) circuits constructed with RRAM NAND gates using 1T2R structures; and (3) Digital Look-Up-Table (LUT) model for RRAM variability. An edge learning framework (for the forward pass) is demonstrated using digital RRAM-NAND/NOR universal gates integrated with the Maximum Count Binary Comparator Layer (MCBC) to partially circumvent the impact of RRAM variability and to quantify the RRAM variability on the CNN training prediction accuracy for 65nm CMOS OxRAM (TiN/HfO2/Hf/TiN) with varying device current compliance of 5, 10, and $50\mu \text{A}$ for low power IoT applications. The MCBC layer was simulated using a SPICE model, for which the estimated chip layout is $1150\times 1230$ nm2 per logical gate input, which resulted in an overall prediction accuracy improvement from 10% to 60% by repeating the logical operations of the NOR gate for {1, 3, 5, and 7} cycles respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords