Deep Learning Accelerators’ Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study

Dennis Agyemanh Nana Gookyi; Eunchong Lee; Kyungho Kim; Sung-Joon Jang; Sang-Seol Lee

doi:10.3390/s23052380

Sensors (Feb 2023)

Deep Learning Accelerators’ Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study

Dennis Agyemanh Nana Gookyi,
Eunchong Lee,
Kyungho Kim,
Sung-Joon Jang,
Sang-Seol Lee

Affiliations

Dennis Agyemanh Nana Gookyi: Electronics Division, Institute for Scientific and Technological Information, Council for Scientific and Industrial Research, Accra, Ghana
Eunchong Lee: Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam-si 13488, Republic of Korea
Kyungho Kim: Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam-si 13488, Republic of Korea
Sung-Joon Jang: Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam-si 13488, Republic of Korea
Sang-Seol Lee: Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam-si 13488, Republic of Korea

DOI: https://doi.org/10.3390/s23052380
Journal volume & issue: Vol. 23, no. 5
p. 2380

Abstract

Read online

Though custom deep learning (DL) hardware accelerators are attractive for making inferences in edge computing devices, their design and implementation remain a challenge. Open-source frameworks exist for exploring DL hardware accelerators. Gemmini is an open-source systolic array generator for agile DL accelerator exploration. This paper details the hardware/software components generated using Gemmini. The general matrix-to-matrix multiplication (GEMM) of different dataflow options, including output/weight stationary (OS/WS), was explored in Gemmini to estimate the performance relative to a CPU implementation. The Gemmini hardware was implemented on an FPGA device to explore the effect of several accelerator parameters, including array size, memory capacity, and the CPU/hardware image-to-column (im2col) module, on metrics such as the area, frequency, and power. This work revealed that regarding the performance, the WS dataflow offered a speedup of 3× relative to the OS dataflow, and the hardware im2col operation offered a speedup of 1.1× relative to the operation on the CPU. For hardware resources, an increase in the array size by a factor of 2 led to an increase in both the area and power by a factor of 3.3, and the im2col module led to an increase in area and power by factors of 1.01 and 1.06, respectively.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords