IEEE Access (Jan 2021)

System-Level Communication Performance Estimation for DMA-Controlled Accelerators

  • Sunwoo Kim,
  • Sungkyung Park,
  • Chester Sungchung Park

DOI
https://doi.org/10.1109/ACCESS.2021.3119516
Journal volume & issue
Vol. 9
pp. 141389 – 141402

Abstract

Read online

The performance of a hardware accelerator is often limited by the communication bandwidth between local on-chip memories and DRAM across on-chip bus. In this paper, a system-level performance estimation algorithm is newly proposed for evaluating the communication performance of direct memory access (DMA) controlled accelerators. The proposed algorithm can estimate the communication performance accurately for both DRAM-limited and bus-limited cases. In detail, the communication performance for the DRAM-limited case is estimated using dynamic prediction of DRAM command patterns whereas the communication performance for the bus-limited case is estimated based on the maximum bus burst latency. Depending on whether the communication bandwidth is limited by the bus protocol overhead or the DRAM latency, the proposed algorithm estimates the communication bandwidth of a DMA-controlled accelerator according to the performance bottleneck. It is shown that the proposed algorithm significantly improves the estimation accuracy when it is applied to CNNs and wireless communications. Also, when the proposed algorithm together with a full-system simulator is used to explore a design space defined by a set of tile sizes and bus-related parameters, it speeds up conventional algorithms by more than a factor of 100 by filtering out a large number of unpromising design points. It is also shown that the proposed algorithm alone can approach the maximum accelerator performance with a performance degradation of less than 5%. An ablation study is applied to prove the efficacy of individual steps of the proposed algorithm.

Keywords