IEEE Journal on Exploratory Solid-State Computational Devices and Circuits (Jan 2021)

Thermal-Aware Design Space Exploration of 3-D Systolic ML Accelerators

  • Rahul Mathur,
  • Ajay Krishna Ananda Kumar,
  • Lizy John,
  • Jaydeep P. Kulkarni

DOI
https://doi.org/10.1109/JXCDC.2021.3092436
Journal volume & issue
Vol. 7, no. 1
pp. 70 – 78

Abstract

Read online

Machine learning (ML) accelerators have a broad spectrum of use cases that pose different requirements on accelerator design for latency, energy, and area. In the case of systolic array-based ML accelerators, this puts different constraints on processing element (PE) array dimensions and SRAM buffer sizes. The 3-D integration packs more compute or memory in the same 2-D footprint, which can be utilized to build more powerful or energy-efficient accelerators. However, 3-D also expands the design space of ML accelerators by additionally including different possible ways of partitioning the PE array and SRAM buffers among the vertical tiers. Moreover, the partitioning approach may also have different thermal implications. This work provides a systematic framework for performing system-level design space exploration of 3-D systolic accelerators. Using this framework, different 3-D-partitioned accelerator configurations are proposed and evaluated. The 3-D-stacked accelerator designs are modeled using the hybrid wafer bonding technique with a 1.44- $\mu \text{m}$ pitch of 3-D connection. Results show that different partitioning of the systolic array and SRAM buffers in a four-tier 3-D configuration can lead to either 1.1– $3.9\times $ latency reduction or 1– $3\times $ energy reduction compared to the baseline design of the same 2-D area footprint. It is also shown that by carefully organizing the systolic array and SRAM tiers using logic over memory, the temperature rise with 3-D across benchmarks can be limited to 6 °C.

Keywords