IEEE Access (Jan 2024)
High-Level Design of Precision-Scalable DNN Accelerators Based on Sum-Together Multipliers
Abstract
Precison-scalable (PS) multipliers are gaining traction in Deep Neural Network accelerators, particularly for enabling mixed-precision (MP) quantization in Deep Learning at the edge. This paper focuses on the Sum-Together (ST) class of PS multipliers, which are subword-parallel multipliers that can execute a standard multiplication at full precision or a dot-product with parallel low-precision operands. Our contributions in this area encompass multiple aspects: we enrich our previous comparison of SoA ST multipliers by including our recent radix-4 Booth ST multiplier and two novel designs; we extend the explanation of the architecture and the design flow of our previously proposed ST-based PS hardware accelerators designed for 2D-Convolution, Depth-wise Convolution, and Fully-Connected layers that we developed using High-Level Synthesis (HLS); we implement the uniform integer quantization equations in hardware; we conduct a broad HLS-driven design space exploration of our ST-based accelerators, varying numerous hardware parameters; finally, we showcase the advantages of ST-based accelerators when integrated into System-on-Chips (SoCs) in three different scenarios (low-area, low-power, and low-latency), running inference on MP-quantized MLPerf Tiny models as case study. Across the three scenarios, the results show an average latency speedup of 1.46x, 1.33x, and 1.29x, a reduced energy consumption in most of the cases, and a marginal area overhead of 0.9%, 2.5% and 8.0%, compared to SoCs with accelerators based on fixed-precision 16-bit multipliers. To sum up, our work provides a comprehensive understanding of ST-based accelerators’ performance in an SoC context, paving the way for future enhancements and the solution of identified inefficiencies.
Keywords