IEEE Access (Jan 2024)

Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture

  • Munhyung Lee,
  • Taehan Lee,
  • Junwon Yeo,
  • Hyukjun Lee

DOI
https://doi.org/10.1109/ACCESS.2024.3496906
Journal volume & issue
Vol. 12
pp. 171295 – 171303

Abstract

Read online

As applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low energy consumption when used to replace DRAM in main memory. Most of the memory space used in DNN applications is occupied by weight and activation data. Typically, the contents of weight data remain unchanged during inference, whereas activation data are modified to store intermediate results across the layers of the DNN. As large DNN applications consume significant energy and require high performance, STT-MRAM is an ideal candidate to fully or partially replace DRAM in main memory. However, the long write latency of STT-MRAM compared to DRAM presents a performance bottleneck. In this work, we propose a reduced retention-time MRAM-based main memory to address this issue. We divide the MRAM into multiple partitions, with each partition implemented with different retention times, tailored for DNN applications. In this approach, the DNN weights can be assigned to the long retention-time partition, while the activation data can be allocated to the short retention-time partition, optimizing for the varying characteristics of data reuse. To achieve high performance, we propose two mapping schemes: intra-segment and inter-segment circular buffers, which dynamically map DNN activation data (i.e., virtual pages of streaming data) to physical pages in a circular fashion to exploit reuse patterns of DNN data. These circular buffers are mapped to the short retention-time MRAM partition as much as possible. The intra-circular buffer mapping scheme achieves an average improvement of 14.4% in bandwidth and 12.6% in inference latency compared to DRAM for on-device DNN applications. Furthermore, the inter-circular buffer mapping scheme offers 11.1% bandwidth and 11.2% latency improvements on average using only 16 MBytes of the short retention-time partition.

Keywords