SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA

Renping Wang; Shun Li; Enhao Tang; Sen Lan; Yajing Liu; Jing Yang; Shizhen Huang; Hailong Hu

doi:10.3934/era.2024105

Electronic Research Archive (Mar 2024)

SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA

Renping Wang,
Shun Li ,
Enhao Tang ,
Sen Lan ,
Yajing Liu ,
Jing Yang ,
Shizhen Huang,
Hailong Hu

Affiliations

Renping Wang: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Shun Li: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Enhao Tang: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Sen Lan: 2. College of Science, Shantou University, Shantou 515603, China
Yajing Liu: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Jing Yang: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Shizhen Huang: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Hailong Hu: 1. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China

DOI: https://doi.org/10.3934/era.2024105
Journal volume & issue: Vol. 32, no. 4
pp. 2310 – 2322

Abstract

Read online

Graph convolution networks (GCN) have demonstrated success in learning graph structures; however, they are limited in inductive tasks. Graph attention networks (GAT) were proposed to address the limitations of GCN and have shown high performance in graph-based tasks. Despite this success, GAT faces challenges in hardware acceleration, including: 1) The GAT algorithm has difficulty adapting to hardware; 2) challenges in efficiently implementing Sparse matrix multiplication (SPMM); and 3) complex addressing and pipeline stall issues due to irregular memory accesses. To this end, this paper proposed SH-GAT, an FPGA-based GAT accelerator that achieves more efficient GAT inference. The proposed approach employed several optimizations to enhance GAT performance. First, this work optimized the GAT algorithm using split weights and softmax approximation to make it more hardware-friendly. Second, a load-balanced SPMM kernel was designed to fully leverage potential parallelism and improve data throughput. Lastly, data preprocessing was performed by pre-fetching the source node and its neighbor nodes, effectively addressing pipeline stall and complexly addressing issues arising from irregular memory access. SH-GAT was evaluated on the Xilinx FPGA Alveo U280 accelerator card with three popular datasets. Compared to existing CPU, GPU, and state-of-the-art (SOTA) FPGA-based accelerators, SH-GAT can achieve speedup by up to 3283$ \times $, 13$ \times $, and 2.3$ \times $.

Published in Electronic Research Archive

ISSN: 2688-1594 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Science: Mathematics; Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods
Website: https://www.aimspress.com/journal/era

About the journal

Abstract

Keywords