A Conv‐GEMM reconfigurable accelerator with WS‐RS dataflow for high throughput processing

Feihu Wang; Chi Zhang; Yongchao Deng; Xu Yang; Shuangming Yu; Runjiang Dou; Nanjian Wu; Liyuan Liu

doi:10.1049/ell2.13125

Electronics Letters (Feb 2024)

A Conv‐GEMM reconfigurable accelerator with WS‐RS dataflow for high throughput processing

Feihu Wang,
Chi Zhang,
Yongchao Deng,
Xu Yang,
Shuangming Yu,
Runjiang Dou,
Nanjian Wu,
Liyuan Liu

Affiliations

Feihu Wang: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Chi Zhang: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Yongchao Deng: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Xu Yang: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Shuangming Yu: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Runjiang Dou: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Nanjian Wu: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China
Liyuan Liu: State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors Chinese Academy of Sciences Beijing China

DOI: https://doi.org/10.1049/ell2.13125
Journal volume & issue: Vol. 60, no. 3
pp. n/a – n/a

Abstract

Read online

Abstract Convolution and matrix operations are both important computations in Deep Neural Networks (DNNs). However, the significant differences between convolution and matrix computation patterns have posed a challenge in efficiently supporting both convolution (Conv) and general matrix multiplication (GEMM) on hardware design. This paper proposes a Conv‐GEMM reconfigurable accelerator architecture for high throughput edge processing. A weight stationary‐row streaming (WS‐RS) dataflow scheme is proposed, which maximizes data reuse through hierarchical memory structures and flexible PE connections, and supports high throughput edge‐based deep learning algorithms. Based on the proposed dataflow, multi‐scale memory access network (MMAN), reconfigurable accumulator array (RAA), and configurable instruction set architecture (ISA) are designed to optimize computation throughput and energy efficiency. The accelerator is designed under 65 nm technology, achieves peak performance of 1.15 TOPS at 250 MHz, with an energy efficiency of 1.14 TOPS/W. The GEMM computation achieves 85.7% latency improvement and the Mobilenet‐V1 processing achieves a throughput of 529 fps under a 256 × 224 image size and an 87.15% (top‐5) accuracy on the ImageNet dataset.

Published in Electronics Letters

ISSN: 0013-5194 (Print); 1350-911X (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ietresearch.onlinelibrary.wiley.com/journal/1350911X

About the journal

Abstract

Keywords