IEEE Access (Jan 2024)

A Highly Parallel DRAM Architecture to Mitigate Large Access Latency and Improve Energy Efficiency of Modern DRAM Systems

  • Tareq A. Alawneh,
  • Ahmed A. M. Sharadqh,
  • Ashraf Al Sharah,
  • Emad Awada,
  • Jawdat S. Alkasassbeh,
  • Ayman Y. Al-Rawashdeh,
  • Aws Al-Qaisi

DOI
https://doi.org/10.1109/ACCESS.2024.3512176
Journal volume & issue
Vol. 12
pp. 182998 – 183023

Abstract

Read online

Modern Dynamic Random Access Memory (DRAM) banks are characterized by their ability to work in parallel, enabling concurrent servicing of multiple memory accesses through the interleaved DRAM banks. This attractive feature is supported in modern DRAM systems by employing large-sized pages to exploit the locality that exists in the row-buffers. However, one of the key limiting factors for fully utilizing the Bank-Level Parallelism (BLP) feature is the increased contention among processing cores in multi-core systems. This may result in inefficient utilization of the available localities in row-buffers. This, in turn, leads to a degradation in DRAM performance and significant energy wastage. This is due to the activation of large pages only to access a small amount of data, typically 64 bytes. As a result, the likelihood of encountering critical power and performance timing constraints increases. In this article, we propose a highly parallel DRAM architecture designed to mitigate the adverse effects of increased memory contention in multi-core systems on DRAM performance and energy efficiency. Specifically, the DRAM architecture introduced in this study incorporates cost-effective modifications that reduce resource sharing among DRAM pages within a sub-array. This design enables concurrent access to wordlines of varying sizes belonging to different DRAM pages within the same sub-array of a memory bank. This enhancement in the utilization of the allocated bank’s row-buffer space significantly improves the performance and energy efficiency of DRAM systems by further enhancing intra- and inter-sub-array level parallelism. This also further relaxes critical DRAM timing restrictions and facilitates access to finer DRAM page granularities. Our experimental results for quad-core multi-program workloads showed that the DRAM architecture proposed in this study provides significant improvements in average memory access latency and overall DRAM energy consumption compared to the baseline, outperforming previously proposed mechanisms.

Keywords