IEEE Access (Jan 2025)
CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling
Abstract
This paper analyzes the dependence of the convolutional neural network (CNN) accelerator performance on loop tiling. More specifically, based on the closed-form expression of the CNN accelerator performance, the dependence on the tile sizes is characterized by the derivative, the asymptote and the switching point between the computation-limited condition and the communication-limited condition. The analysis provides a useful insight into how to determine the tile sizes to achieve the required performance while avoiding an unnecessary static random access memory (SRAM) size increase. The paper also deals with the optimum resource-constrained loop tiling for CNN accelerators. Given the constraint on either the on-chip buffer size or the multiply-accumulate (MAC) array size, tile sizes are optimized to maximize the performance. The closed-form expressions of the optimum tile sizes provide useful insights into how to allocate the available hardware resources for maximum performance. From performance evaluation, the proposed tile sizes achieve almost the maximum performance, which enables the optimization of tile sizes without relying on exhaustive search, speeding up design space exploration.
Keywords