DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Angelo Garofalo; Yvan Tortorella; Matteo Perotti; Luca Valente; Alessandro Nadalini; Luca Benini; Davide Rossi; Francesco Conti

doi:10.1109/OJSSCS.2022.3210082

IEEE Open Journal of the Solid-State Circuits Society (Jan 2022)

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Angelo Garofalo,
Yvan Tortorella,
Matteo Perotti,
Luca Valente,
Alessandro Nadalini,
Luca Benini,
Davide Rossi,
Francesco Conti

Affiliations

Angelo Garofalo: ORCiD; Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy
Yvan Tortorella: ORCiD; Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy
Matteo Perotti: IIS Integrated Systems Laboratory, ETH Zürich, Zürich, Switzerland
Luca Valente: ORCiD; Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy
Alessandro Nadalini: Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy
Luca Benini: Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy
Davide Rossi: ORCiD; Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy
Francesco Conti: ORCiD; Department of Electrical, Electronic and Information Engineering, University of Bologna, Bologna, Italy

DOI: https://doi.org/10.1109/OJSSCS.2022.3210082
Journal volume & issue: Vol. 2
pp. 231 – 243

Abstract

Read online

On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.

Published in IEEE Open Journal of the Solid-State Circuits Society

ISSN: 2644-1349 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electric apparatus and materials. Electric circuits. Electric networks
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782712

About the journal

Abstract

Keywords